Enhanced boolean processor with parallel input

ABSTRACT

A relational processor having multiple inputs for receiving and processing parallel words. The relational processor comprises one or more input subsections for converting parallel input data to serial output data. Each of the one or more subsections has a parallel input for receiving the parallel input data and a respective subsection output for outputting the serial output data. A plurality of Boolean processors process the serial output data into processed output data, which plurality of Boolean processors are each operatively connected to the subsection outputs of the one or more input subsections to receive the serial output data. The processed output data is routed with a data routing system which is connected to a processor output of each of the plurality of Boolean processors to route data therefrom to one or more destination circuits. The relational processor processes the input data in a single pass.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a Continuation of U.S. Pending application09/684,761, filed on Oct. 6, 2000, and now U.S. Pat. No. 6,829,695,which is a Continuation-in-Part of pending U.S. patent application Ser.No. 09/389,567 (Atty. Dkt. No. OGPT-24,727) entitled “LUNIVERSAL SERIALBIT STREAM PROCESSOR,” and related to the pending U.S. patentapplication Ser. No. 09/390,221 entitled “INDEX RELATIONAL PROCESSOR”(Atty. Dkt. No. OGPT-24,573), pending U.S. patent application Ser. No.09/389,542 and entitled “METHOD AND APPARATUS FOR IMPLEMENTINGRUN-LENGTH COMPRESSION” (Atty. Dkt. No. OGPT-24,577), and pending U.S.patent application Ser. No. 09/390,499 and entitled “ASYNCHRONOUSCONCURRENT DUAL-STREAM FIFO” (Attorney Dkt. No. OGPT-24,578), all ofwhich are incorporated by reference herein.

TECHNICAL FIELD OF THE INVENTION

This invention is related to compression/decompression architectures,and more particularly, run-length compression/decompression usingBoolean processes. BACKGROUND OF THE INVENTION

With the proliferation of computer-based data systems in all facets ofbusiness, techniques for efficiently handling the potentially largeamounts of digital information are becoming increasingly important in avariety of communications and electronic data storage applications. Forexample, enhanced methods of converting, storing, and searching largedatabases to retrieve the information may be critical to using such asystem. Typically, large databases are structured to reduce the searchtime associated with finding records in such databases. To expeditelarge database searches, keys arranged in ordered indices of B-trees maybe provided which point to the physical location of each record. Thismethod is much more efficient that a linear approach of searching thedatabase from the beginning to the end when the desired record mayhappen to be stored near the end of the database.

Additionally, physical data compression techniques are used to reducehardware costs, data transfer times, and system storage space.Compaction algorithms are especially attractive where large files suchas scanned images are stored. Transmission of such large uncompressedfiles not only displaces available bandwidth, but also requiressignificantly more storage space. However, a compression/decompressionalgorithm which is cumbersome to implement may actually offset any gainsobtained by compressing the information in the first place. Similarly,when studying the scanning device itself, large amounts of data andrespective transmission speeds become important design problems. Forexample, a facsimile machine scans a document with electro-opticaldevices line-by-line to generate the electrical data for transmission.However, the amount of data generated from one page in a document can bevery large. A sheet of paper the size of A4 may scan to approximately 2million bits of data which are required to be transmitted and received.Therefore different methods of transmitting such large files ofinformation have been sought for more efficient and faster transmissionof facsimile information.

Run-length compression is a popular data compression technique whichprovides significant data compression for repeating characters orpatterns. It uses very simple compression and decompression algorithms.Most run-length compression schemes are usually based on Huffman entropycoding techniques. A Huffman code is a lossless data compressionalgorithm which uses a small number of bits to encode common characters.Huffman coding approximates the probability for each character as apower of 12 to avoid complications associated with using a nonintegralnumber of bits to encode characters using their actual probabilities.The Huffman algorithm converts characters into bit strings using abinary tree containing all possible characters. The Huffman code for acharacter may be obtained by traversing the tree, where if a left branchis chosen the bit is 0; if a right branch is taken the bit is 1. Huffmancompression is a statistical data compression technique which gives areduction in the average code length used to represent the symbols of aalphabet. A Huffman code can be made by (1) ranking all symbols in orderof probability of occurrence, (2) successively combining the two symbolsof the lowest probability to form a new composite symbol, eventuallybuilding a binary tree where each node is the probability of all nodesbeneath it, and (3) tracing a path to each leaf, noticing the directionat each node.

It can be shown mathematically that Huffman coding will give an optimumcompression factor based on the symbol frequency distribution (entropy).However, Huffman coding does suffer from a key drawback—two passesthrough the data file are required. The first pass through the data filecollects the frequency of occurrence for each run length for bothstreams of ones or zeros. With the list of the occurrence frequencies, avariable-length code set is developed to “remap” the input file. Thesecond pass applies the remap codes to the data file creating a newcompressed file. The two-pass approach requires that a conversion key bestored with the compressed data. The required two passes through theinput file represents a serious impediment to high throughput computing.

Furthermore, recursive operations on bit streams (e.g., databasethreads) are very advantageous in arriving at a final search result.However, recursive operations require that the intermediate results(also called an intermediate vector) of a partial Boolean operation bekept locally (e.g., stored in a memory buffer) for reuse in thegeneration of another partial or final Boolean operation. (The binarybit stream may be compressed or uncompressed.) The processing of abinary bit stream is serial in nature. Thus a first-in/first-out (FIFO)device is a logical choice for the memory buffer. A FIFO can be looselydescribed as a data “pipe” that flows in one direction from the input tothe output, and can hold a specific amount of information bits.

A requirement of the FIFO for use in the recursion process is that ithave two alternating memory (also called “ping-pong”) buffers. Ping-pongbuffers alternate respective functions in the processing and retentionof intermediate data stream results. For example, if buffer “A” iscollecting the current processing results and buffer “B” is feeding itsoutput as input to the Boolean processor from the last iteration, thenonce processing is complete for the current iteration, the buffers willreverse roles, where buffer “A” is the input to the Boolean processorand “B” is storing the results. It can be seen that the buffers willalternate or ping-pong.

A final requirement for the memory buffer is that it must be largeenough to hold the binary streams associated with the threads from alarge database. The semiconductor industry has developed numerous FIFOchip solutions. However, classical FIFOs are optimized for speed and notfor memory size. This is primarily due to the popular use as elasticbuffers for disk and high speed communications systems. The greatadvantage in using these “off-the-shelf” FIFOs is that all the elementsfor the FIFO are contained in one integrated circuit. The FIFOintegrated circuits are also cascadeable so that larger buffer sizes canbe created. Unfortunately, the largest size of a classical FIFO (e.g.,64 KB) is insufficient for use with the disclosed relational engine. Thedisclosed architecture requires at least 16 MB for the buffer. Thereforea hybrid solution is required.

SUMMARY OF THE INVENTION

The invention disclosed and claimed herein, in one aspect thereof, is arelational processor comprising one or more input subsections forconverting parallel input data to serial output data. Each of the one ormore subsections has a parallel input for receiving the parallel inputdata and a respective subsection output for outputting the serial outputdata. A plurality of Boolean processors process the serial output datainto processed output data, which plurality of Boolean processors areeach operatively connected to the subsection outputs of the one or moreinput subsections to receive the serial output data. The processedoutput data is routed with a data routing system which is connected to aprocessor output of each of the plurality of Boolean processors to routedata therefrom to one or more destination circuits. The relationalprocessor processes the input data in a single pass.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention and theadvantages thereof, reference is now made to the following descriptiontaken in conjunction with the accompanying Drawings in which:

FIG. 1 illustrates a conceptual block diagram of the process of queryinga database and outputting a result;

FIG. 2 illustrates a process of performing a bit-by-bit logicaloperation on collections to arrive at an intermediate query result;

FIG. 3 illustrates the composition of a super collection;

FIG. 4 illustrates a general block diagram of the relational engineaccording to a disclosed embodiment;

FIG. 5 illustrates a more detailed block diagram of the relationalengine of FIG. 4;

FIGS. 6A and 6B illustrate a flowchart of the potential signal pathsoffered by the disclosed relational engine;

FIG. 7 illustrates a block diagram of the relational engine coresubsystem;

FIG. 8 illustrates a block diagram of one of four input processsubsections shown in FIG. 5;

FIG. 9 illustrates a digital signal processing subsection, asillustrated in FIG. 5;

FIG. 10A illustrates a general block diagram of a configureable Booleanstream processor as a bit-addressable memory;

FIG. 10B illustrates a general block diagram of a configureable Booleanstream processor as illustrated in FIG. 5;

FIG. 11 illustrates a concurrent FIFO definition block diagram;

FIG. 12 illustrates the concurrent FIFO status logic;

FIG. 13 illustrates a timing generator and associated gate control logicof the concurrent FIFO;

FIG. 14 illustrates the concurrent FIFO counter and DRAM I/O coupling;

FIG. 15 illustrates a conventional Huffman coding tree;

FIG. 16 illustrates a detailed bit layout of the comma codes;

FIG. 17 illustrates a detailed breakout of a sample raw bit stream andits encoded counterpart;

FIG. 18 illustrates an unbalanced decoding tree according to thedisclosed embodiment;

FIG. 19 illustrates a flowchart which defines processing for rawbit-stream encoding;

FIG. 20 illustrates a sequence of steps for run-length processing as asubroutine of the main encoding function;

FIG. 21 illustrates the processing for the decode process;

FIG. 22 illustrates a relational engine system;

FIG. 23 illustrates an alternative embodiment where all input channelshave decompression capabilities;

FIG. 24 illustrates a general block diagram of the processor of FIG. 5;

FIG. 25A illustrates a general block diagram of an enhanced Booleanprocessor having multiple 32-bit parallel inputs, according to adisclosed embodiment;

FIG. 25B illustrates a more detailed block diagram of the disclosedparallel architecture;

FIG. 25C illustrates a high-level overview of the Boolean processingembodiment;

FIG. 25D illustrates a general conversion process, according to adisclosed embodiment;

FIG. 26 illustrates the basic data flow of the thread-to-collectionconversion process;

FIG. 27 illustrates a simplified data structure of a 32-bit thread in aprimary FIFO;

FIG. 28 illustrates a block diagram of a simplified fragment converterand bit accumulator;

FIG. 29 illustrates a block diagram of a simplified Tag ID comparator ofFIG. 28;

FIG. 30 illustrates a block diagram of the 32-bit accumulator of FIG.28;

FIG. 31 illustrates a simplified data structure of a 58-bit word in asecondary FIFO;

FIG. 32 illustrates a block diagram of a fragment-to-collectionconverter at the output of the secondary FIFO of FIG. 28;

FIGS. 33A and 33B illustrate a flowchart of the thread-to-collectionconversion process;

FIG. 34A illustrates a basic thread structure and its constituent IDcomponents, according to the process steps of the flowchart of FIGS. 33Aand 33B;

FIG. 34B illustrates the contents of various registers according to theprocess steps of the flowchart of FIGS. 33A and 33B;

FIG. 35 illustrates a conversion of the bit binary structure of the32-bit fragments to the collection output data stream;

FIG. 36 illustrates the comma codes used with of the enhanced Booleanprocessor embodiment; and

FIG. 37 illustrates a system of enhanced Boolean processors.

DETAILED DESCRIPTION OF THE INVENTION

Referring now to FIG. 1, there is illustrated block diagram of a systemfor processing a database query. There is provided a database 100, whichcontains data to be queried. This data is queried and the results outputtherefrom on an output 101 in an operation that will be describedhereinbelow. This is facilitated by a query engine 103. As will bedescribed hereinbelow, the query engine 103 does not require the numberof records in the database 100 to be a fixed predetermined numberestablished at the creation of the database, as in prior art systems(such that the number of records grows to a fixed record limit). Forexample, a prior art system may have a predetermined ceiling of onemillion records for a database. Having an actual database of 10,000records still results in searching the 10,000 records in order to obtainthe results. However, having an upper limit of one million recordstranslates into one million bits which must be processed in order toobtain the query results over 10,000 records. The query engine 103 isuniversal in that it is compatible with other database structures andwith a database having no fixed upper limit in the number of records,but that grows as the number of records increase. For example, adatabase having an initial number of 10,000 records results in a bitstream of 10,000 bits. Adding 5,000 more records simply means dealingwith a resulting bit stream of 15,000 bits.

The query engine 103 is a relational database system operating under theregime of relational processing. A database is simply a list of recordswhich have associated therewith an absolute record index value. Withineach record is a set of key fields. For example, the key fields maycomprise a name, address, telephone number, state, zip code, age, haircolor, sex, etc. The key fields are defined by the particular businesscreating the database. Addition of a new record entry is simply appendedto the end of the current string of records which comprise the database.A business having 10,000 employees has a database of at least 10,000records.

The database 100 structured to operate according to the disclosedembodiment, adds new records by placing them at the “end” of the recordstorage area. Using this storage technique, a record maintains its samerelative position from the start or beginning of the record storagearea. The “distance” or number of records away from the beginning isreferred to as the record index value. This positional invariance is thekey issue for processing field relational data using the disclosedarchitecture, as will be discussed hereinbelow.

When querying the database 100 records, the fields desired to besearched are known. A result of the query is a binary tree (B-tree) fora particular key field. (Note that the query result for a disclosedembodiment is a B-tree. However, B-trees are not the only way to handlestorage issues.) Each field type in a record is converted into abalanced binary tree with nodes of the tree defining each field matchpossibility. A simple example is the value for the key field of sexwhich results in a B-tree having only two nodes being created (male andfemale). A more complex example would be the state of residence, whichcould have up to 50 nodes. Thus, tree nodes vary in complexity based onthe variation of record field content. Associated with a tree node is alist of record indexes that identify those records in the database thatmatch the criteria specified for the tree node. For example, if the33^(rd), 86^(th), and 10,000^(th) records have Texas in its state field,the thread for that Texas node will have the respective integer valuesof 33, 86, and 10,000, listed in ascending order. This list is referredto as a “thread.” A thread is further defined as a list of 32-bitintegers (index values) which are sorted in ascending order.

By way of example, a database having one million records is queried forthe following match criteria: sex of male, color of hair as red, age 25,marital status of single, and state of residence as New Jersey. Eachitem in the criteria list represents a “match” item for a key fieldwithin a record. The record may contain many more fields, but these arethe only fields selected for a match. The following B-trees andassociated nodes searched are listed in Table 1. TABLE 1 B-Tree Node inTree Sex Male Hair Color Red Age 25 Marital Status Single State ofResidence New Jersey

When searching a tree, a matching process occurs while “walking” downthe tree from the top node. For example, if the tree consists of all theStates of Residence in the United States (there being fifty), and thesearch requests the records related to New Jersey, the matching processwalks down the tree of fifty nodes until a successful match occurs withNew Jersey. The integer list of records associated with the New Jerseynode are then accessed to obtain all the records having New Jersey inthe key field of State of Residence. The addition of a new record havingNew Jersey as a state (a record not a part of the earlier tree),requires the New Jersey tree to be modified to include the record indexin its thread.

For this relational example, each node thread is then logically combined(ANDed) with each other. The logical combination is performed byconverting a record index of the thread to a bit position in acontiguous binary data stream (or collection) for each thread. Acollection is a vector that represents the result of a query, and iscompressed by removing one or more bits from the input bit stream fortransmission. The process of converting a thread to a collectionperforms a mapping function of a record location to a bit position. Acollection is the unit of information exchanged between a client and aserver. The client and server may be connected across a LAN, WAN, oreven a global communication network such as the Internet. When theclient receives the compressed collection, it decompresses (or expands)the collection and requests of the server only those records meeting thesearch criteria. This represents minimal intrusion on the databasethereby saving transmission bandwidth. (In prior art systems, the clientwould receive a one-million bit word, instead of, for example, acompressed 30-byte word.) The client may perform further queries on thecollection received from the server. Each binary data stream for eachthread has the same length. Due to the invariance of the record positionin the database, the bits in the independent data streams can belogically combined using the laws of Boolean algebra. It is theperformance of this process that the disclosed accelerator architectureis designed to improve.

By way of overall operation, the disclosed query engine 103 is organizedround the concept of querying databases, the result of which combineslists of 32-bit integers, referred to as threads. A database queryresults in the construction of one or more balanced binary trees eachassociated with a key field (e.g., state, sex, age). Each binary treehas one or more tree nodes representative of the query criteria (e.g.,New Jersey, male, 25). Each tree node has associated therewith thethread. The thread is a list of 32-bit integers representing the recordindexes of all records of the database having the particular querycriteria, and which integers are sorted in ascending order (e.g., 33,57, 634). In order to use the thread integers for relational processing,they need to be transformed into bit positions in a contiguous binarystream. A database query invokes a process where individual threads arelogically combined to produce this contiguous binary bit stream referredto as a collection.

Referring further to FIG. 1, the database 100 is operable to contain alarge number of records having associated therewith key fields. If, forexample, the records were those of employees, the key fields may provideinformation related to the name of the employee, address, state, zip,sex, marital status, etc. If the database of employees were to bequeried in order to find all employees over the age of forty, the querywould return an index file 102 which lists all the records of thedatabase of employees which meet the search criteria of having an agegreater than forty.

The index file is simply a list of integers of the record locations thatmatch the search criteria. This index file or thread is then input to acollection generator 104 for processing into a collection. Thecollection generator 104 is a multi-input block for handling one or morethread inputs. For example, if the employee database is queried for allemployees over the age of forty and having a marital status of single,the resulting index files 102 would number two, that being one indexfile for all those database records matching the search criteria formarital status of single and another index file having all the databaserecords which match the search criteria of over forty. These two indexfiles are then input to the collection generator 104 resulting in twocollections at the output of the collection generator 104. A collectionis defined as a bit stream of records matching the search criteria. Thetotal number of bits in the collection equals the total number ofrecords in the database. A zero value in any of the bit positions of thecollection indicates an unsuccessful match of the search criteria, and aone in any bit position of the collection represents a successful matchof the match criteria. Therefore, by performing a Boolean operation onthe one or more collections, one can obtain all records of a databasethat meet the overall search criteria.

The number of threads input to the collection generator 104 is onlylimited by the number of search criteria. Therefore, the output ofcollection generator 104 is one or more collections (collection 1, 2, .. . , N—also referenced as 106, 108, and 110, respectively). The one ormore collections 106, 108, and 110 are then input to Boolean processor112 where the desired Boolean operators are performed on the collections106, 108 and 110. For example, if collection 1 represented all employeesover the age of forty and collection 2 represented all employees with amarital status of single, to find all the employees who are over fortyand single, a Boolean AND operation is performed on each bit position ofthe collections 106 and 108. The output of the Boolean processor 112 isa single bit stream which may be fed to any subsequent processingoperation such as to a “compander” subsection 114 for compression (whichmay be a digital signal processor (DSP)), or the results may be fed backrecursively into the input of the Boolean processor 112. (The use of theterm compander denotes a dual function performed by the DSP ofcompressing data and expanding data.) Note that the companding processmay be hardware-based or software-based. The compander 114 is run-timeconfigurable where software updates can be downloaded from a hostsystem, if desired. The compander 114 can accept configuration commandsfrom the hardware initialization stream. This allows fixes ormodifications to the compander 114 functions as well as physicalprocessing logic by updating the host data files.

Recursive operation is useful where the number of search criteriaexceeds the available input channels of the relational enginearchitecture. In the recursive operation information from an initialquery may be processed and the results fed back in at the input foradditional processing with other query information. The final results ofthe recursive operation may be fed to a compander subsection 114 forcompression. Alternatively, in the scenario where one may be queryingmultiple databases (a multi-database collection control block 118 isuseful in maintaining bit stream synchronization.) For example, thedatabase 100 is queried and the resulting index file 102 is accessed.Similarly, another database 120, be it local or remote, is queried and aresulting index file 122 is accessed. To exploit transmission bandwidthmore effectively, the resulting collections created by the query ofdatabase 100 and database 120 may be shipped to the relational engine incompressed format. Carrying the example further, if the information ofinterest resides only in the bit stream of database 120, thisinformation of interest can be obtained by decompressing the totalcollection bit stream using compander subsection 114 and selectivelyobtaining only that desired information using the multi-databasecollection control block 118. That desired information related todatabase 120 is then input to the Boolean processor 112 along with anyother collection information.

Referring now to FIG. 2, there is illustrated a process for performing aBoolean operation on respective bit positions of collections, acollection referring to a sequence of bit positions, each positioncorresponding to a record. If the bit position is high (a logic “1”),then the associated record is part of the collection. If it is low (alogic “0”), it is not a part of the collection. If there are “N” recordsin the database, then there are “N” bit positions in every possiblecollection. For example, in a first collection 200, a number of bitpositions 1, 2, . . . , N and a second collection 202 comprises asimilar number of bit positions 1, 2, . . . , N. The collections 200 and202 have the same number of bit positions which represent the number ofdatabase records in the database 100. As mentioned hereinabove, a valueof one in any bit position of any collection represents a successfulmatch of the search criteria with the database and a value of zeroindicates an unsuccessful match against any of the search criteria. TheBoolean operator can be any of a wide variety of Boolean logicalfunctions including AND, XOR, OR, etc. The output of the Booleanoperator 204 results in a third collection 206 having the same number ofbit positions as the first collection 200 and the second collection 202.The value placed in the bit positions of a third collection 206 are theresults of the Boolean operation performed on that particular bitposition of the first and second collections 200 and 202 respectively.For example, if the Boolean operator 204 was an AND function, thelogical AND operation is performed on the first bit positions of boththe first collection 200 and the second collection 202. The resultingoutput is placed in the first bit position of the third collection 206.Similarly, the AND operation is performed on the second bit positions ofthe first collection 200 and the second collection 202 with the resultbeing placed in the second bit position of the third collection 206. Thelogical operation is then performed likewise on all bit positionsthrough the Nth bit position of the collections.

Referring now to FIG. 3, there is illustrated a diagram of a bit streamof collections. The bit stream of collections is known as a supercollection and, in this particular collection, include two collectionsfrom database A, collection 300 and collection 304, having a collection302 from database B placed therebetween. As mentioned hereinabove, toenhance the effectiveness of available bandwidth, data compression isused wherever possible. Therefore, a super collection would normally becompressed and to obtain any information of collection 302 from databaseB, the entire bit stream would need to be decompressed using compandersubsection 114 and input to the multi-database collection control block118. The multi-database collection control block 118 performs thenecessary offset and synchronization to obtain the collection 102 ofdatabase B for processing.

Referring now to FIG. 4, there is illustrated a general block diagram ofthe relational engine. The relational engine circuitry 400 interfaces toa PCI bus 402 via a PCI bridge circuit 404. The peripheral componentinterconnect (PCI) bus architecture is commonly found in a wide varietyof personal computers and workstations. The PCI bus is a 32-bit widelocal bus employed in many personal computers and workstations for thetransfer of data between the PC's main CPU and periphery, such as harddisks, video cards or adapters, etc. Effective transfer speeds acrossthe PCI bus 402 may reach up to 132 megabytes per second. (It should benoted that this architecture is not limited to a PCI bus architecturebut, is applicable to any architecture which provides the properinterfacing to the relational engine circuitry 400.) The relationalengine circuitry 400 interfaces through the PCI bridge 404 to a CPU 406on the PCI bus 402. The CPU 406 has associated with it a memory 408 forstoring data and furthermore, has associated with it and attached to thePCI bus 402, a storage unit 410 for the mass storage of files, includinga database of records. A user wishing to query the database of recordsstored in storage unit 410 enters the key field information into the CPU406. The CPU 406 then performs the query and places the query resultsinto the memory 408. The relational engine circuitry 400 then retrievesthe search results directly from the memory 408 through a direct memoryaccess (DMA) process across the PCI bus 402 along a path 412 to memory408, or indirectly through the CPU 406. Note that the disclosedarchitecture is not limited to DMA but may incorporate anymemory-accessing process.

The search results are 32-bit words representing the integer values ofrecord indexes of records meeting the match query criteria. The 32-bitbinary words are input into an input channels block 414 which comprisesone or more input channels Four in this embodiment). The input channelsblock 414 incorporates the circuitry to transform the 32-bit binarywords into either serial bit streams of integers or what are called“collections.” The serial bit stream at the output of the input channelsblock 414 is routed to a relational processor block 416. The relationalprocessor block 416 performs Boolean operations on the one or morecollections or integers received from the input channels block 414. Theoutput of the relational processor block 416 is a serial bit streamwhich is converted back to a 32-bit parallel word, and may be routedback through the PCI bridge 404 across the PCI bus 402 to the CPU 406.

Alternatively, the output serial bit stream of relational processor 416may be routed to a compander subsection 418 (similar in operation tocompander subsection 114). The compander subsection 418 performs afunction of compressing the output bit stream of the relationalprocessor block 416 and placing the compressed output onto a bus 420 fortransmission back to CPU 406. The compander subsection 418 also performsan expansion function whereby compressed data input to one or more ofthe channels of the input channels block 414 may be expanded (ordecompressed) by inputting the compressed signal directly into compandersubsection 418. The decompressed bit stream is then fed into therelational processor 416 for Boolean processing.

If further processing on threads is desired, the results of the Booleanprocessing performed by the relational processor 416 may be fed backinto a recursion channel 422 to perform further Boolean operations inconjunction with perhaps, original input channel binary strings. Thecore architecture of the recursion channel 422 is a concurrent FIFOcircuit, which will be discussed in greater detail hereinbelow. Acontrol register 424 monitors activity on the bus 420 and providesappropriate control signals to various other points and the relationalengine circuit 400. Similarly, a status register block 426 receivesstatus input from the various points of the relational engine circuit400 and places the status signals back onto the bus 420 for access andprocessing by all circuits. A timing and control generator 428 receivesa 100 MHZ clock signal from an oscillator 430 to provide the necessarytiming and synchronization parameters for operation of the relationalengine circuit 400.

Referring now to FIG. 5, there is illustrated a more detailed blockdiagram of the relational engine of FIG. 4. The disclosed relationalengine architecture 400 receives 32-bit words comprising threads andcollections at one or more of four input subsections. The thread andcollection 32-bit words are compatible with the PCI 32-bit bus system.The thirty-two bits can represent either a chunk of thirty-two bits ofthe overall bit stream or a 32-bit integer which is predefined so thatthe channels know what the 32-bit data word is. The threads andcollections are stored on a storage unit 410 (illustrated on FIG. 4)over the PCI bus 402 through PCI interface 404 on the 32-bit wide bus420 into the channel subsections. The input channels consist of a FIFOand some control bits which define whether the input bit stream is acollection or integers. Threads are always assumed to be sorted inascending order since relational processing occurs from the lowest tothe highest bit values. Each input channel A through D operates on aseparate 32-bit word, which words are fed in a parallel fashion into theinputs of channels A through D. Channels can be mixed in the sense thatchannels A and B could be processing collections while channels C and Dcould be processing threads. The output of channels A through D areserial bit streams. The four serial outputs run in parallel into therelational processor 416.

Operation of the relational processor 416 is flexible in that the bitstream can be converted back into a parallel word using theserial-to-parallel converter 516 or the bit stream can be converted backinto a thread using the bit position-to-integer converter 518. Theoutput of the bit position-to-integer converter 518 is a 32-bit wordshipped over a 32-bit wide bus which is passed to a two-to-onemultiplexer 520. In each of the four channels A through D, there is anelastic storage capability (FIFO) for providing buffering of the inputto the subsection. Furthermore, a FIFO block 522 receives the output ofthe two-to-one multiplexer 520 and provides some 32-bit word controlleading into the dual concurrent FIFO block 524 which will be discussedin greater detail hereinbelow. Channels C and D have special processingcapability.

Channel C is the only channel, in this particular embodiment, which cando decompression. (It can be appreciated that any or all of the inputchannel subsections could be designed to accommodate decompression, aswill be disclosed in greater detail hereinbelow with the enhancedBoolean processor.) The compander subsection 418 handles thedecompression function which feeds into channel C through a multiplexer526. Multiplexer 526 has two inputs, one input which receives 32-bitwords off of bus 420 and the other input which receives the output ofthe compander subsection 418. Data brought in across the PCI bus 402 andbus 420 may be input through a FIFO block 528 and into the compandersubsection 418 for decompression. The decompressed data is then outputfrom compander subsection 418 across a 32-bit wide bus 530 to themultiplexer 526 for input to the channel C subsection 504. The compandersubsection 418 also provides a compression function, hence the term“compander.” Therefore, the output of compander subsection 418 is eithercompressed or decompressed (expanded) based upon a selection via portMR4 of mode register 536 so the bit stream at the output of relationalprocessor 416 can be either compressed or transmission back across thePCI bus 402, or fed back into channel C subsection 504 for furtherprocessing.

In this particular embodiment of FIG. 5, channel D provides a recursioncapability and its path is from the relational processor 416 through thetwo-to-one multiplexer 520 through the FIFO 522 and on through the dualconcurrent FIFO 524 to the input of the D channel subsection 506. Anintermediate value can be stored in the dual concurrent FIFO 524.Therefore, intermediate values need not be placed back out on the PCIbus 402 for processing by the main CPU 406, but intermediate processingis performed away from the PCI bus 402 in the relational engine circuit400. Therefore, I/O traffic of the PCI bus 402 is kept to a minimum. Theintermediate value is essentially the output bit stream of therelational processor 416 stored momentarily in the dual concurrent FIFO524. It is called an “intermediate value” since the value is ultimatelyfed back into the channel D subsection 506 for further processing withone or more of the other channel subsections 500, 504 or 506 to arriveat an ultimate value. In contrast, the enhanced Boolean processordisclosed in greater detail hereinbelow with respect to FIG. 25,provides that each channel now has the capability of data recursion.

The dual concurrent FIFO 524 is a 64-Megabyte memory which can bearbitrarily changed to fit the needs of the relational engine circuit400. Note that the size of the memory is arbitrary. The output of thedual concurrent FIFO 524 is, in one instance, input to a 2-to-1multiplexer 532 and passed through a 4K FIFO 534 for placement on thePCI bus 402. Alternatively, the output of the dual concurrent FIFO 524is redirected back to the input of Channel D 506 for recursiveprocessing of the data. A mode control register 536 provides modecontrol for most circuits of the relational engine 400 requiring suchcontrol. Additionally, a state control circuit 538 clocked by a masteroscillator provides the timing and synchronization control forread/writes and counter incrementing of all counters and registers ofthe relational engine 400. For example, an integer counter 546 whichreceives control from the state control circuit 538 provides an input toeach of the four input process subsections 500, 502, 504, and 506.

Another provided capability is that of simply counting the number ofrecords which matched the search criteria. A record counter circuit 540monitors the output serial bit stream of the relational processor 416and counts the “1” bits. The count value is then passed back to the CPU406 over the PCI bus 402.

The relational engine 400 also comprises the capability of windowing.When two or more collections are compressed into a continuous bitstream, a super collection is created. It may be desirable to processonly one of the collections of the super collection. Collections canbecome super collections where, for example, a collection of salesrecords is concatenated with a collection of inventory records. Aproblem with concatenated collections is that the offset needs to beknown to arrive at a particular record. Since the collections arecompressed, word alignment no longer exists. To address this problem,“windowing” circuitry is utilized to provide for focusing in on theparticular collection desired. For example, a super collection of threecollections C1, C2 and C3 each having 10,000 bits is compressed downinto a single bit stream. In order to access the collection C2, thesuper collection must first be decompressed.

Windowing provides the capability of offsetting a counter value toaccess and process only a particular collection in the bit stream. Inthis particular example, the windowing circuitry provides an offset of10,000 to arrive at the starting point of collection C2 (being thesecond collection in the string of three collections). At this point,the C2 collection can be operated on to simply read records or to modifythe collection by recursively processing it, followed by compressionback into a super collection. Alternatively, the decompressed collectionC2 can be operated on and left in its decompressed (raw) state in thebit stream with compressed collections C1 and C3 on either side of it.Similarly, if a compressed super collection follows an integer bitstream, the offset mechanism also provides for offsetting the bitstreamby the length of the integer stream to process the super collectiondata. A window control circuit 542 having a binary counter 544 connectedthereto, allows this to occur.

The disclosed architecture comprises one or more input channels (500,502, 504, and 506), a Boolean relational processor 416, an input channel506 which doubles as a recursion channel for optional use, or the datacan be brought back out as integers, raw collections, or compressedcollections. Control registers define the nature of the bit stream. Theinteger stream is more than simply a binary bit stream-it defines aparticular value or specific address of a record number in the database.The relational engine has the added capabilities of counting the numberof bits which indicate the records matching the search criteria using arecord counter, and windowing. The counter value of the record countblock 540 is a read-only device which can be read at any time and resetat any time. Windowing is used to select a specific collection from acontinuous stream of bits comprising a string of collections.Furthermore, each device connected to the PCI bus 402 is addressable andhence, selectable.

Referring now to FIG. 6A, there is illustrated a flowchartrepresentative of the data paths used in processing of a database queryusing the relational engine of the disclosed embodiment. Processingbegins at a start block 600 and moves to a function block 602 where thedatabase is queried. The query process, as discussed in FIG. 1hereinabove, is accomplished by walking down a balanced binary tree toarrive at a node having associated therewith a list of all databaserecords with which that particular node is associated. The concept ofbalanced binary trees will be discussed in greater detail hereinbelow.After obtaining a query result in the form of a thread, which is a listof integers representing the records having a matched criteria, flowmoves to a decision block 604 to determine if decompression is required.If decompression is required, flow moves out the “Y” path to a functionblock 606 where the compressed bit stream is sent to the compandersubsection 418 for decompression. Flow moves to function block 608 wherethe decompressed output of the compander subsection 418 is routed backto the input of Channel C (one of four input channels disclosed in thisembodiment). Although, in this particular embodiment, Channel C isdesignated to handle decompressed information, any of the input channelsA-D can be so configured provided the proper circuit connections aremade.

If decompression of the query information is not required, flow movesout the “N” path of decision block 604 to another decision block 610 todetermine if a collection is to be created. As mentioned hereinabove,information input to the process subsections may be in the form of athread which is a 32-bit integer or a raw collection. A raw collectionrequires no processing other than to convert it from a 32-bit parallelword to a serial bit stream at the output of the process subsectionblock. Therefore, if the input to the process subsection is a rawcollection, flow moves out the “N” path of decision block 610 to afunction block 612 where the raw collection is simply converted throughthe input subsection to a serial bit stream and passed on to therelational processor. On the other hand, if the inputs to the processsubsection was a list of integers, flow moves out the “Y” path ofdecision block 610 to a function block 614 where the thread is input toany one of the Channels A-D. Flow moves then to a function block 616where the list of integers is then converted to a collection. Thecollection is then output from the Channel D process subsection inserial fashion to the relational processor, as indicated in functionblock 618.

In either case, whether the data input to the process subsection was araw collection or was a list of integers which was subsequentlyconverted to a collection, the output of the process subsection is aserial bit stream which is input to a relational processor where Booleanoperations are performed on the collections, as indicated in functionblock 620. Flow then moves to a function block 622 where the output ofthe relational processor is a serial bit stream. The output of therelational processor is a collection itself whose bit positionsrepresent the results of logical operations performed on respective bitpositions of the collections input to the Boolean processor. Flow thenmoves to a decision block 624 where the user may choose to count thenumber of records which met the search criteria. If the user desires tohave a record count made, flow moves out the “Y” path to a functionblock 626 where the number of one bits are counted in the resultingcollection. Flow then moves to a function block 628 where the countvalue is returned.

Referring now to FIG. 6B, if the records are not desired to be counted,flow moves out the “N” path of decision block 624 to the input ofdecision block 630 where the collection output at the relationalprocessor as a serial bit stream may optionally be converted back into aparallel word, or into a list of integer values. (Note also that afterthe count is returned to the user, flow moves from function block 628also to the input of decision block 630.) If the user decides to convertthe output serial bit stream of the relational processor to a parallelword, flow moves out the “Y” path of decision block 630 to functionblock 632, where the conversion is made. Flow then moves to a decisionblock 634 to determine of the parallel word is to be recursivelyfeedback into the input. If so, flow moves out the “Y” path to afunction block 636 to input the parallel word into Channel D. Note thatany input process subsection may be configured for recursive processing,but in this particular embodiment, only Channel D is designed for such acapability. Flow then loops from function block 636 back to the input ofdecision block 610 to pass the collection through for processing by therelational processor.

If the parallel word (collection) is not to be recursively processed,flow moves out the “N” path of decision block 634 to determine ifcompression is needed, as indicated in decision block 640. At thispoint, the parallel word may be placed on the system bus in either acompressed or uncompressed state. If the parallel word is going to becompressed first, flow moves out the “Y” path of decision block 640 to afunction block 642, where the word is sent to the DSP and compressed.The compressed word is then placed onto the system bus, as indicated infunction block 644. The process then returns, as indicated in block 646,to process other information.

Referring back to decision block 630, if the user desires not to convertthe output serial bit stream of the relational processor to a parallelword, flow moves out the “N” path to a function block 638 to convert theserial bit stream to a list of integers. This list of integers is simplya list of the records matching all of the search criteria. Flow thenmoves to decision block 640 where the user may then compress the integerlist or output the list directly to the system bus. If the integer listis to be compressed, flow moves out the “Y” path to function block 642where the list is sent to the DSP and compressed. The compressed outputdata is then placed directly onto the system bus, as indicated infunction block 644. Flow then moves to return block 646 to continue theprocessing of information. Alternatively, if the data is not to becompressed, flow moves out the “N” path of decision block 640 tofunction block 644, where the data is placed directly onto the systembus. Flow is then to return block 646 to continue processing otherinformation.

Referring now to FIG. 7, there is illustrated a block diagram of thecore subsystem of the relational engine 400. To accommodate windowing,each of a window start register 700, a window end register 702, and awindow control register 704 receives data input from the PCI bus 402.The windows start register 700 and windows end register 702 buffer therespective starting and ending addresses of the collection to be pluckedfrom the continuous bit stream for processing. The window controlregister 704 controls whether windowing will be enabled or disabled. Anoutput limit register 706 stores the current address of the input word.The output of the output limit register 706 feeds a limit comparator 708which provides a check against the known number of bits which should beprocessed. For example, if the number of records being processed is onemillion, the limit comparator “knows” the limit should be set at onemillion. If, at the end of processing, the output limit register 706indicates that the last record processed was at register location onemillion, a successful comparison results, and a scan complete flag isset. On the other hand, if at the end of processing there was adiscrepancy between the value residing in the output limit register 706and the limit comparator 708, other measures can be taken to ensure thatthe information is processed correctly.

The outputs of each of the window start register 700, window endregister 702 are fed to a respective start comparator 710, and an endcomparator 712 in conjunction with a counter value output from an outputcounter 714. A bit is output from each comparator based upon asuccessful comparison with the value of the output counter 714. A matchof the value of output counter 714 with the value stored in output limitregister 706 results in a scan complete flag being output from the limitcomparator 708. Matches in each of the respective start and endcomparators 710 and 712 results in a binary one being sent to a flowcontrol block 715. The flow control block 715 receives single-bitwindow, preamble and postamble enable flags from the window controlregister 704, and outputs a single bit to a 2-to-1 multiplexer 716.

An Boolean OpCode register 718 inputs a 16-bit OpCode to the relationalprocessor 416 to control the desired logical operations to be performedon the serial bit streams input from the Channels A-D input processsubsections 500, 502, 504, and 506. A 4-to-1 multiplexer 720 alsoreceives the serial bit streams and is used as a means to bypass therelational processor 416 when processing is not required on selectedinput channels. Switching control of the 4-to-1 multiplexer 720 isreceived from the window control register 704 to select the primarychannel. The 2-to-1 multiplexer 716 selects between the serial output ofthe relational processor 416 (an intermediate collection which is theresults from performing logical operations on two or more of thecollections at the input of the relational processor 416) and the outputof the 4-to-1 multiplexer 720 (which are unprocessed collections whichbypass any logical operations performed by the relational processor416). The flow control block 715 determines which of two inputs to the2-to-1 multiplexer will be selected for output both to aserial-to-parallel converter 722 and a gate block 724. The output of theserial-to-parallel converter 722 is either a 32-bit intermediatecollection or an unprocessed (raw) collection. Either of these 32-bitparallel words may be selected for output by another 2-to-1 multiplexer726.

Either of the intermediate collection or the raw collection is alsoinput to gate 724. The gate 724 provides synchronized flow of either theraw or intermediate serialized collections to a 32-bit record counter728 and to an index out register 730. An index counter 732 provides anincreasing 32-bit parallel count value to the index out register 730 andto each of the input process subsections 500, 502, 504, and 506. Arecord count value from the 32-bit record counter 728 is put to the PCIbus 402 for later processing. The value in the index out register 730 isalso output through the 2-to-1 multiplexer 726 for processing.

Referring now to FIG. 8, there is illustrated a block diagram of aninput channel 800, according to a disclosed embodiment. The inputchannel (e.g. , input subsection 500), as mentioned hereinabove,receives a 32-bit wide word from the PCI bus 402 and serializes it foroutput at a control gate 802 for input to the relational processor 416.The input subsystem 800 has an elastic buffer interface (FIFO) 804 whichreceives data from the PCI bus 402 into a first input and a load commandat another input 806. The FIFO 804 has 4K registers and outputs a 32-bitwide word across a 32-bit wide bus 808 to a byte lane steering logiccircuit 810.

The byte lane steering logic circuit 810 orders the bytes according theparticular byte-ordering of the host system. To provide for universalapplications across many different computer platforms, the input channel800 must be operable to handle both endian byte-ordering structures. Acomputer is said to be big-endian or little-endian depending on whetherthe least significant bit is in the highest or lowest addressed byte,respectively. Different byte ordering means that between certaincomputer systems, multi-byte quantities are stored differently. Forexample, the most significant byte of a four-byte hexadecimal word isstored first on the big-endian system and last on the little-endiansystem. Furthermore, in some computer systems, the byte-ordering maychange depending on whether the word came from a register of memory. Inany case, the disclosed architecture incorporates the necessary featuresto ensure the proper byte ordering for both big-endian and little-endiansystems.

When the bytes exiting the byte lane steering logic 810 are acollection, the collection word is fed to a parallel-to-serial converter812. From the parallel-to-serial converter 812, the bit stream isoptionally sent to decompression circuit 814 (e.g., a DSP) fordecompression, or to a 2-to-1 multiplexer 816 for pass-through as a rawcollection.

Integers received from the PCI bus 402 at the input to the FIFO 804, aretreated differently. A dynamically-applied integer offset is injectedvia an integer offset circuit 818. This feature is necessary whendealing with a super collection. A super collection is defined as acollection of collections. The offset is added to the input word via a2's complement adder 820. The output of the adder 820 is a 32-bit wordsent across a 32-bit wide bus 822 to an equality comparator 824. Aninput 826 to the equality comparator is a 32-bit wide index pointergenerated from an binary index counter. The equality comparator 824performs a comparison between the 32-bit output word from the adder 820and the 32-bit wide index pointer from the binary index counter. Thebinary index counter begins counting up from the value one. If there isa match between the counter value and the output of the adder, a compareflag bit (of binary value “1”) is output along a single path 828 to a2-to-1 multiplexer 830. The binary counter continues counting up andoutputting a compare flag bit every time a “1” bit is encountered in theword output from the adder 820. Alternatively, zeros are output for allcount comparisons when a “1” bit is not encountered in the word outputfrom the adder 820.

Threads can be synchronized to collections. The input Channels A-D (asshown in FIG. 5), according to this embodiment, operate independentlywith respect to the type of input stream, either a collection or abinary integer stream. That is to say, Channel A may be processing a rawcollection while Channel B is processing an integer bit stream.Synchronization occurs when a record “hit” occurs on Channel B (a “1”bit is detected by the comparator circuit 824) and the value on ChannelA is converted. Furthermore, a logical operation may be performed onboth the Channel A raw collection stream and the Channel B integerstream, in a single pass. The fetch logic circuit 832 provides thesynchronization timing, in that, if a match occurs with the equalitycomparator 824, a new value is brought in, or if thirty-two bits werejust consumed for serialization, another thirty-two bits are pulled in.

If the input bit stream was compressed, the decompression circuit 814 isused to decompress the bit stream prior to serialization. The 2'scomplement offset 818 is used to separate collections which areconnected together. (i.e., a super collection). Providing the 2'scomplement adder 820 corrects any alignment of the collections in theoverall bitstream. Once set for an operation, it never changes becausethe offset is fixed. The offset is usually set to zero. The use of anoffset permits logically connecting two or more different databases. Forexample, if several distinct physical databases (distributed databases)are located at respective remote locations (Toledo, Japan, and Germany),and each has customer records, these three databases can be appended toone another into a single bit stream using the offset capabilities ofthe disclosed input channel architecture. Each database starts at afirst record has a known number of records, the bn\number of recordsbeing equal among the databases. For example, the database of Toledo mayhave 10,000 records, the database of Japan may 10,000 records, and thedatabase of Germany may have 10,000 records. The offset capabilitiespermit assembly of one contiguous bitstream, for example, having Toledofirst in the bitstream with records 1-10,000 followed by the Japanesedatabase of 10,000 records (offset by 10,000) with record locations inthe bit stream of 10,001-20,000, followed by the German database of10,000 records (offset by 10,000 with respect to the Japanese database)with record locations in the bit stream at 20,001-30,000. Therefore, afixed offset of 10,000, in this example, can be used to find theboundaries of the various databases of records appended together as asuper collection to form a contiguous bit stream.

Referring now to FIG. 9, there is illustrated the basic building blocksof the compander subsection 418. The compander subsection 418 has two32-bit input structures (900 and 902) whose outputs tie to a common bus904, which bus 904 provides access to a DSP 906. One input structure 900receives a 32-bit parallel word from the PCI bus 402 into the 4K FIFO528 (See FIG. 5). From the FIFO 528, the 32-bit word is split into two16-bit words with one of the 16-bit words input into a first 16-bitregister 908 and the other 16-bit word input to a second 16-bit register910. Similarly, the other input structure 902 receives a 32-bit parallelword, only from the bit-to-integer converter 218 (of FIG. 5). The 32-bitword is split into two 16-bit words with each word being input toseparate 16-bit registers 912 and 914.

The DSP 906 is a 16-bit processor which has its program code stored in anon-volatile memory 916 (e.g., a EPROM). Program code is uploaded fromthe memory 916 to the DSP 906 at power-up. Also associated with the DSP906 is a scratch-pad memory 918 of 16K×16-bit RAM for use of temporarystorage during the companding process. The 16-bit registers (908 and910) and (912 and 914) of structures 900 and 902, respectively, output16-bit words to respective 2-to-1 multiplexers 920 and 922. Themultiplexers control which 16-bit word of the respective structures 900and 902 is input to the DSP 906 for processing. The DSP 906 hasassociated therewith two additional 16-bit output registers 924 and 926.The outputs of these two registers 924 and 926 are eventually joined toprovide a 32-bit output word for placement on the PCI bus 402 or forinput to Channel C for recursive processing.

Serial Bit Stream Processor

The rules for the logical combination of variables are defined byBoolean algebra. The key Boolean logical operators are AND, OR, and NOT.A Boolean function consists of one or more inputs (referred to as inputvariables) and a single output. The single output is a function of theinput variables and the logical operators. Both the input variables andthe output operate on binary numbers. A thread is converted into abinary bit stream (or vector) by converting record indexes of thedatabase into respective bit positions. A logical “1” indicates therecord contains key field information which matches the search criteria.A logical “0” indicates the lack of any matching key field criteria. Thedisclosed index relational processor (IRP) is operable to process up tofour input variable streams.

There are approximately eighty different logical combinations for one,two, three, or four binary variables. A function generator implementedin the IRP uses a “table lookup” technique to solve any Boolean equationof four variables. For four input variables, there are sixteen possibleinput combinations. For each input combination, the output must have aunique binary value of one or zero. The table lookup method requiresevery possible combination of inputs to be explicitly defined for everyoutput having a binary zero or one. Parity generation is a modestlycomplicated function to implement and provides a good example fordemonstrating the flexibility of the table lookup technique. An oddparity bit is defined as that bit which is added to a group of bits toensure that the sum of the “1” bits in the group is always an oddnumber. As an example, a Table 2 is constructed for computing the oddparity for a 4-bit number where A, B, C and D represent the inputvariables comprising a 4-bit number and F is the odd parity bit. TABLE 2Odd Parity Generation A B C D Odd Parity Bit (F) 0 0 0 0 1 0 0 0 1 0 0 01 0 0 0 0 1 1 1 0 1 0 0 0 0 1 0 1 1 0 1 1 0 1 0 1 1 1 0 1 0 0 0 0 1 0 01 1 1 0 1 0 1 1 0 1 1 0 1 1 0 0 1 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1

A more compact interpretation of this result is to transpose the “F”output bits from a vertical to horizontal format. The odd parity bitexample thus becomes a 16-bit binary word 1001011001101001 (orhexadecimal 9669), and is referred to as a function bit map (also calledan OpCode). Another example using table-driven processing is provided bythe following relational statement of four input variables: A≠B ANDC=D→Z, where the output Z is true (a binary 1). The following Table 3summarizes the input variable values and intermediate results. TABLE 3The Lookup Table Solution X Y Z A B C D (A ≠ B) (C = D) (X AND Y) 0 0 00 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 1 1 1 0 1 0 11 0 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 1 0 0 1 1 0 0 1 0 1 0 10 0 1 0 1 1 1 1 1 1 1 0 0 0 1 0 1 1 0 1 0 0 0 1 1 1 0 0 0 0 1 1 1 1 0 10

By inspection it can be seen that only four input combinations result ina logical true output. The bit-mapped result is 16-bit OpCode or binaryword 0000110010000 (also hexadecimal 0990).

A Bit-Addressable Memory as a Boolean Function Processor

Referring now to FIG. 10A, there is illustrated a general block diagramof a configureable Boolean stream processor as a bit-addressable memory.As demonstrated in the two examples above, it is possible to interpretthe table output (function) as a single 16-bit binary OpCode, where eachof the input variable combinations are “mapped” into a unique bitlocation of the word. Therefore, a bit-addressable memory 1001 can beused to translate this bit-map word into a Boolean function. Thebit-addressable memory 1001 consists of individual flip-flops (a binarymemory device) that can be individually and selectively read back. Forthe IRP, a 16-bit memory is organized so that a single word Write willset or reset the individual flip-flops in the memory. Separate addressinputs A, B, C, and D allow the individual flip-flops in thebit-addressable memory 1001 to be selected for a single bit Read. Theseseparate address inputs A, B, C, and D are selected via the 16-bitOpCode fed in at a 16-bit map OpCode input 1003. Therefore, if theseparate address inputs A, B, C, and D are interpreted as inputvariables and the contents of the bit-addressable memory 1001 representthe function results, the bit-addressable memory 1001 operates as auniversal Boolean function generator. When implemented as a fieldprogrammable logic array, the bit-addressable memory 1001 provides asolution where in under 30 nsec, the function result is available at theoutput 1005.

Converting Thread Data To A Collection

As previously noted, threads contain 32-bit integers sorted in ascendingorder. Each integer represents a physical record index where a specificquery item may be found in the database. In order to use the integersfor relational processing, they need to be transformed into bitpositions in a contiguous binary stream (or collections). Thistransformation process is accomplished by five circuit elements: aninput FIFO memory, equality comparator, a binary up-counter, timinggenerator, and output FIFO memory. The output port of the input FIFOmemory represents the four input variables for Boolean processing. Aspreviously noted, the four input variables (or threads) contain the listof integer values representing the record locations in the database.

The equality comparators (four independent units) compare the value ofthe counter to the output of each input FIFO. The output of eachequality comparator is a single bit. If the two 32-bit integer inputs tothe equality comparator are the same value the single bit output istrue. If the values are not the same, the output is false.

The up-counter is a 32-bit synchronous design capable of being clockedat a 50 MHz rate. The counter output is compared to the input FIFOmemory port outputs (at the equality comparator).

The conversion from integer to bit stream (or collection) begins withthe up-counter being initially cleared to zero. The up-counterincrements by a value of one until such time that input processing iscomplete. The clock pulse driving the up-counter originates from thetiming generator. The timing generator provides all the sequencingpulses needed to perform the integer-to-binary stream conversion. Thetiming generator synchronizes on (1) the availability of data from theinput FIFO, and (2) the output FIFO Not Full status flag. The timinggenerator senses the output of the Boolean function generator, and if itis true, the timing generator produces a timing pulse to load theup-counter value into the output FIFO. The timing generator iscontrolled by a master 50 MHz clock. The output FIFO “collects” thecounter values when the function generator is true.

In operation, utilizing the above-described process implemented inspecialized hardware, multiple threads can be logically processed intoan index collection at very high speeds in a single pass usinghost-defined logical relationships. Basic steps taken to achieve thisresult include (but not necessarily in this order): (1) feeding a bitmap to the input of the Boolean function generator, (2) initializing theup-counter to zero, (3) beginning the conversion process only if theoutput FIFO is not full and multiple thread data (up to four) areavailable for output from of the respective input FIFOs, (4) comparingthe values of the input FIFO outputs to the reference up-counter usingequality comparators, and outputting from each equality comparator asingle bit for a total of four bits which are input to the Booleanfunction generator (also called the index relational processor), (5)copying into the input port of the output FIFO the up-counter value ifthe function output of the Boolean function generator is true, (6)advancing to the next sequence value any input FIFO having a value thatmatches the up-counter, (7) incrementing the up-counter by one, and (8)repeating steps (3)-(7) if the process is not complete.

Referring now to FIG. 10B, there is illustrated a block diagram of theBoolean relational processor 416. The relational processor 416 consistsof a single 16-bit register 1000 which defines an operational code(OpCode). The 16-bit register 1000 receives a 16-bit wide word from thePCI bus 402, the word loaded according to a command received at a load(LD) port 1002. A 1-of-16 selector 1004 receives a 16-bit paralleloutput of the 16-bit register 1000 and outputs a single bit for theresult. The 1-of-16 selector 1004 is controlled by a path-enable gate1006. The path-enable gate 1006 controls the 1-of-16 selector 1004 byselecting which of the sixteen bits will be allowed to pass through tothe output of the 1-of-16 selector 1004. The path-enable gate 1006 hasas four of its inputs, the serial output bit streams of each of the fourChannels A-D (of FIG. 5). A Boolean operation is performed on respectivebit positions of selected bit streams to provide an output word forcontrol of the 1-of-16 selector 1004 output. A 4-bit enable circuit 1008provides the Boolean operation code to the path-enable gate 1006.

The 4-bit enable circuit 1008 receives a 4-bit word from the PCI bus402, and in conjunction with a masking word input at port 1010, providesthe Boolean operation control word to the path-enable gate 1006. TheBoolean operation control word provides the Boolean operation to beperformed by the path-enable gate 1006. For example, considering foursample bit streams for Channels A-D in Table 2 below. If only Channels Cand D were selected for processing according to a logical AND operation,the values in bit position one of both Channels C and D would belogically ANDed, then the values of bit position two, and so on, untilall sixteen bits were processed. The result is a 4-bit hexadecimal value(8888, in this example) which is passed to the 1-of-16 selector 1004 andindicates to the selector 1004 which one of sixteen bits from the 16-bitregister are to be passed to the processor output. TABLE 4 Channels A BC D C · D Hex 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 8 0 1 0 0 0 0 1 01 0 0 1 1 0 0 0 1 1 1 1 8 1 0 0 0 0 1 0 0 1 0 1 0 1 0 0 1 0 1 1 1 8 1 10 0 0 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 8

According to the disclosed embodiment, the Boolean relational processor416 is configureable for a wide variety of logical operations which canbe performed on the bit streams.

Dual Concurrent Fifo

The semiconductor industry has developed numerous first-in/first-out(FIFO) chip solutions. Classical FIFOs are optimized for speed and notfor memory size. 38 This is primarily due to the popular use as elasticbuffers for disk and high speed communications systems. The greatadvantage in using these “off-the-shelf” FIFOs is that all the elementsfor the FIFO are contained in one integrated circuit. The FIFOintegrated circuits are also cascadeable so that larger buffer sizes canbe created. Unfortunately, the largest size of a classical FIFO (e.g.,64 KB) is insufficient for use with the disclosed relational engine. Thedisclosed architecture requires at least sixty-four megabytes for thebuffer. Therefore a hybrid solution is required.

A classic FIFO is a first-in/first-out data device where data appearingat the input, appears at the output after some slight delay. Thedisclosed dual concurrent FIFO is different in this respect, i.e., datainput to the FIFO only becomes available when a flip signal is sent. Thedisclosed dual concurrent FIFO is similar to a ping-pong buffer wheretwo buffers are present and the first is an input buffer with the secondas an output buffer. However, this is where the similarities end. Thefunctions of the twin virtual buffers flip when the flip signal is sentsuch that the first buffer (set of memory locations in the memory array1108) now becomes an output device and the second buffer (set of memorylocations) becomes an input device. When flipped, it acts as a classicFIFO in that what data went in first, comes out first. However, inaddition to the classic FIFO operation, as the data is output, thosememory locations become available for input by the other buffer.Therefore, as the output FIFO is unloading memory locations, the inputFIFO can start accessing those memory locations for inputting more data.It can be appreciated that the input data can not be accessed until aflip signal is sent. Furthermore, the memory locations are thirty-twobits wide, so that 32-bit words can be continually loaded and unloaded.This architecture lends itself well to recursive iterations since thesize of the binary words used during the recursive process neverchanges. This is one large buffer having two memory spaces in a singlememory array 1108 from which to perform data I/O, not two separatebuffers as in the classic ping-pong buffers. A state machinesynchronizes the memory loading and unloading.

Referring now to FIG. 11, there is illustrated a concurrent FIFO (CCFF)definition block diagram, according to the disclosed architecture. TheCCFF control and counters circuit 1100 has a 32-bit wide input 1102 anda 32-bit wide output 1104. Interfaced to the CCFF 1100 is a FIFO memory1108 comprising a DRAM array having a size of 64 Megabytes. Theinterface from the CCFF control and counter circuit 1100 to the FIFOmemory 1108 also includes a 12-bit wide address bus 1110, and 3-bitcontrol bus 1112. The twelve address lines 1110 are necessary for DRAMmultiplexing. The three control lines 1112 accommodate RAS, CAS, andwrite enable control of the FIFO memory 1108. The CCFF control andcounter circuit 1100 has a load input 1114 and fetch output 1116 thefunctions of which are toggled when a flip signal is present at a flipinput 1118. The CCFF control and counter circuit 1100 is a state machinewhich synchronizes the storage of data in the FIFO memory array 1108.Synchronization is important in a memory having a fixed number ofavailable buffers for swapping data in and out. Therefore, the CCFFcontrol and counter circuit 1100 further comprises two output commands,one of which signals when the input buffers are full (input-full 1120)and the other which signals when the output is empty (output-empty1122). These outputs are toggled according to the state of the flipsignal at the flip input 1118. The CCFF control and counter circuit canbe reset by placing such a command at a Con-FIFO reset input 1124. A 50MHZ clock provides the timing for the CCFF control and counter circuit1100.

Referring now to FIG. 12, there is illustrated the concurrent FIFOstatus logic. The first-time block 1200 provides the starting point orreset function of the CCFF control and counter circuit 1100. Otherwise,the FIFO always looks like it is full. The mode block 1202 determinesthe mode in which the dual FIFOs are operating, based upon the flipinput signal at the flip input 1118. A reset line 1204 receives a resetcommand from the Con-FIFO reset line 1124 and the flip signal present atthe flip input 1118 clocks the reset signal 1124 through to the Qoutputs of both the mode block 1202 and first time blocks 1200. Eachtime the flip signal is received, it toggles the flags of input_full1120 or output_empty 1122 according to the respective inputs of compX/compY 1206 or A empty/B empty 1208.

Referring now to FIG. 13, there is illustrated a timing generator andassociated gate control logic of the CCFF control and counter circuit1100. A timing generator 1300 receives the following inputs:output_empty 1122, input_full 1120, Con-FIFO Reset 1124, LD_input 1114and fetch_output 1116, and generates the timing according to 50 MHZclock input. Depending on the mode signal at the mode input 1302, thegate logic output circuits 1304 and 1306 (which are counters) incrementoutputs Out A/B 1308 or In A/B 1310.

Referring now to FIG. 14, there is illustrated the concurrent FIFOcounter and memory I/O coupling. The general concept for operation ofthe CCFF is a single memory space used in accordance with the operationof two FIFOs, the two FIFOs (A and B) separated by a moving “demarcationline.” Control of the CCFF is such that the memory addresses in use by afirst FIFO (designated FIFO A) are never used simultaneously by a secondFIFO (designated FIFO B). In operation, the outputs of the gate logicoutput circuits 1304 and 1306 (of FIG. 13) drive independent 24-bitcounters (1400, 1402, 1404, and 1406). Gate logic output circuit 1304drives counters 1402 and 1406, while gate logic output circuit 1306drives counters 1400 and 1404. Upon first use, no data exists in theCCFF memory space, therefore no data exists in either of FIFO A or FIFOB. When data is first input into FIFO A, the values in both of the FIFOA input counter IN-A 1400 and the FIFO A output counter OUT-A 1402, arezero. As data is stuffed into the registers of FIFO A, the IN-A counter1400 starts to change, incrementing up the address space to track thefilling of memory locations with data.

The first-time block 1200 sets the location of the demarcation lineduring this first-use operation under software control, which softwaresenses that no more data is available for input, and causes a flipsignal to be sent switching input/output roles of the A and B FIFOs. Itcan be appreciated that the initial input operation to FIFO A couldpossibly fill the complete 64 Mb memory 1108 (note that larger memoriescould also be used), although this is unlikely, and is of no consequenceif such a scenario occurs. Once the software signals the end of theinput operation to FIFO A, the roles switch so that FIFO A now becomesan output structure, and FIFO B then begins receiving input data. Datainput to FIFO B begins at the N+1 memory location from the last memorylocation N of FIFO A into which data was stuffed. FIFO A now begins tooutput data as an output structure, and FIFO B continues to receive datauntil software, again, determines that it is time to switch input/outputroles of the FIFOs A and B. Since it is conceivable that the inputoperation to FIFO A could encompass all of the memory 1108, FIFO A mustthen be emptied first. Additionally, since the size of the data does notchange, the input/output operations of the FIFOs A and B occur inlockstep (i.e., one data word being moved out of FIFO A tracks with onedata word being input to FIFO B, and vice versa). The recursionoperation of the relational engine operates to first fetch informationfrom the CCFF, which fetch then releases memory locations for use in theinput operation of FIFO B, and vice versa.

During the second and subsequent data input operations of FIFO A, acompare operation occurs with an A/B comparator 1412 which tracks howfull FIFO A is by comparing the value of the IN-A counter 1400 with thevalue of the OUT-B counter 1406 (the value of the OUT-B counter set bynow knowing the location of the demarcation line, i.e., the next memorylocation after the memory address associated with the “virtual”demarcation line). Notably, the counters are a backup mechanism to thesoftware, which software has a primary function of ensuring when theroles switch between the dual FIFOs. When a match occurs (indicatingthat the data input pointer is at the last memory register in FIFO A,which is next to the first memory register of FIFO B, the boundarybetween the last register of FIFO A and the first register of FIFO Bdefining the demarcation line), a bit is set (representing a COMP Xsignal) to stop the input operation for FIFO A.

When the data input operation to FIFO A is complete, data input to FIFOB begins, and FIFO A toggles to an output mode. To facilitate data inputto FIFO B, the resulting counter value from the IN-A counter 1400 ofFIFO A is then shifted into both of the IN-B counter 1404 and OUT-Bcounter 1406. Data input to FIFO B commences by the filling of memoryregisters from the last memory location of FIFO A indicated by the valueof the IN-A counter 1400. The OUT-B counter 1406 is stuffed with thevalue of the IN-A counter 1400 so that the first-out operation of FIFO Bbegins where the demarcation line of FIFO A memory space ended, which isthe resulting value of the IN-A counter 1400 representative of the lastFIFO A memory location filled with data. As the data input processproceeds in FIFO B, a compare operation is performed continually toensure that the process of inserting data into FIFO B does not overwriteany memory registers of FIFO A. As the value of the IN-B counter 1404increments to the last memory location filled in FIFO B, a compareoperation occurs with a B/A comparator 1414 which compares the value ofthe IN-B counter 1404 with the value of the OUT-A counter 1402. When amatch occurs, a bit is set (representing a COMP Y signal) indicatingthat FIFO B is now full.

When a Flip signal is issued causing the FIFO A to toggle to the outputmode, a first-out data output operation begins in FIFO A, with the OUT-Acounter 1402 incrementing up the FIFO A memory space as data is outputfrom the FIFO A registers. A comparator 1408 continually compares thevalues of the OUT-A counter 1402 and the IN-A counter 1400 such thatwhen the values are equal, an A-EMPTY signal is issued which stops theoutput of data from FIFO A, and which indicates that FIFO A is nowempty, and ready for data input.

Similarly, as the Flip signal indicates that FIFO B is to now operate inan first-out mode, the value of the OUT-B counter then increments upuntil it matches the value of IN-B, and a B-EMPTY signal is issued whichindicates that FIFO B is empty of data and ready for further input, andat which time the counters are reset to zero. Activity is then toggledback to FIFO A and the value of the IN-B counter of FIFO B is then movedover to the IN-A and OUT-A counters of FIFO A. Therefore the location ofthe demarcation line changes each time the Flip signal is sent, as itmoves through the CCFF memory space. Eventually, the last memorylocation of the memory 1108 is reached, and the input/output operationrolls over to the first memory location of the memory 1108 and continuesfrom there. Notably, if any of the input channels lacks an input word, astate machine which monitors the input activity of the input channelsand controls the Boolean processor, halts processing of the Booleanprocessor in order to wait for another word to be input at that specificinput channel such that the Boolean processor will always be processinginformation of the input channels.

Comp X and Y outputs 1416 and 1418 are employed to ensure that thevirtual FIFO memory spaces in the 64-Megabyte DRAM memory array 1108 donot collide. Thus the values in the 24-bit IN-A counter 1400 and the24-bit OUT-B counter 1406 are compared in the A/B comparator 1412. Whena match occurs, a single bit is output at the COMP X output 1416indicating the FIFO B buffer locations are about to be overwritten withvalues input by the A-IN counter 1400. Similarly, the values in the24-bit OUT-A counter 1402 and the 24-bit IN-B counter 1404 are comparedin the B/A comparator 1414. When a match occurs, a single bit is outputat the COMP Y output 1418 indicating that the FIFO A buffer locationsare about to be overwritten with values input by the B-IN counter 1404.A 4-to-1 multiplexer 1420 receives four 24-bit inputs from the fourcounters (1400, 1402, 1404, and 1406) and outputs one 24-bit word in two12-bit parts (an upper 12 bits and a lower 12 bits) to a 2-to-1multiplexer 1422. The 2-to-1 multiplexer 1422 receives the upper andlower 12-bit words and outputs a single 12-bit word to the memory array1108 for addressing internal registers. Data is staged to and from theDRAM memory array 1108 over a local bus 1430 using an input register1424 (holding register), a tri-state buffer 1426, and an output register1428. The memory array 1108 is controlled using row address strobe(RAS), column address strobe (CAS) and WE (write enable) inputs.

Run-Length Compression Architecture

Referring now to FIG. 15, there is illustrated a conventional Huffmancoding scheme. A Huffman scheme is based upon statistical coding whichmeans that the probability of a symbol has direct bearing on the lengthof its representation. The more probable the occurrence of a symbol, theshorter will be its bit-size representation. One example of this type ofimplementation is the Morse code. In the Morse code, the letter “E” hasthe highest frequency of occurrence in the English vocabulary, and istherefore represented by the shortest symbol, a single dot. Other lessfrequently occurring symbols like an “X” are assigned combinations ofdots and dashes. One problem with the Morse code was defining thebeginning and end of a symbol. This was solved by instituting a pausebetween every symbol. Huffman coding detects spaces between symbols inthe variable-length storage scheme and thus, a message can be encoded ina continuous sequence of bits.

Huffman trees are a special form of binary trees. All that is needed tobuild such a tree 1500 is a list of symbols with associated frequenciesof occurrence, e.g. {(A, 52) (B, 7) (C, 8) (D, 8) (E, 12) (F, 2) (G, 1)(H, 1) (I, 4)}, or relative frequencies, e.g. {(A, 0.547) (B, 0.074) (C,0.084) (D, 0.084) (E, 0.126) (F, 0.021) (G, 0.011) (H, 0.011) (I,0.042)}, which are used to estimate the respective probabilities. Fromthe list above, it can be seen that the symbol “A” has a high relativefrequency or probability, while symbol “G” will only rarely appearwithin a message.

The binary tree 1500 is built from the bottom-up starting with the twoleast frequent symbols (e.g., G and H). Within a tree 1500, a leaf node1502 holds a single symbol, while a branch node 1504 contains compositesholding the accumulated set of all the symbols that lie below it, aswell as the sum of all the respective frequencies. Each new branch node1504 points to those two still unbound leaf 1502 or branch 1504 nodeswith the smallest original or accumulated frequencies. Notice that 1'sand 0's are used to note the direction taken from a branch node 1504 asright or left, respectively. The binary digits are used to form the verycontent of the message to be transmitted.

The encoding process begins by working down from the top branch node1506 of the tree 1500. In searching for a specific symbol located at aleaf node 1502, the associated “1” or “0” is recorded depending on whena respective right turn or left turn is taken from a branch node 1504.For example, the symbol string “ABCDEFGHI” will be encoded as “1 01100111 000 010 00110 001110 001111 0010.” The symbol “A” is found off thetop branch node 1506 of the tree 1500 by taking a right turn from thetop branch node 1506 to the first leaf node 1508. By recording a “1”bit, the symbol “A” is encoded, according to this particular tree. Next,the symbol “B” is encoded as a “0110” by taking a left turn (recording a“0” bit) off the top branch node 1506 of the tree 1500 to a branch node1510, then a right turn (recording a “1” bit) to branch node 1512, thenanother right turn (recording a “1” bit) from branch node 1512 to abranch node 1514 (recording a “1” bit), and finally a left turn(recording a “0” bit) from branch node 1514 to a leaf node 1516 wherethe symbol “B” resides. The process than stops for that symbol with aresulting bit string of “0110.” The encoding process continues in asimilar manner for the remaining symbols C through I. (Note that in theabove bit stream, the spaces are added only to improve readability forthis discussion, where in actual practice the bit stream is continuous.)

The decoding process uses the same tree and again, begins by workingdown from the top branch node 1506 of the tree 1500. If the bit is setto “1” it will cause a right turn, while “0” causes a left turn.Downward movement continues until the leaf is reached. Looking at aparticular section of the bit stream used by way of example above, a“0110 010,” results in the symbols “B” and “E” being decoded. Forexample, “0110” is executed by taking the direction indicated by thefirst bit (bit “0”) from the top branch node 106. This being a leftturn, flow continues to branch node 110 where the second bit (bit “1”)indicates that a right turn should be taken. Flow moves to the nextbranch node 1512 where the third bit (bit “1”) indicates that a rightturn should be taken. Next, flow moves to a branch node 1514 where thefourth bit (bit “0”) indicates that a left turn should be taken. Thisbeing leaf node 1516, the process stops and obtains the value associatedwith that node location (a symbol “B” in this particular tree).

Comma Codes

The disclosed run-length technique describes a binary bit streamrun-length companding process. It adapts the output for both run-lengthoutputs and random pattern (literal) outputs. Short-term “trend”statistics are evaluated to invert the bit stream, if required. Theinversion process keeps the compression factor equal for runs ofcontiguous “1” or “0” bits. Whereas conventional two-pass systemsrequire the inclusion of a conversion key table for translation of theencoded data, the disclosed run-length encoding technique offers asingle-pass solution using “comma codes,” and with a stop limit onnegative compression, and no need for inclusion of a translation table.(Negative compression is where the resulting encoded output bit streamis bigger than the raw input bit stream.) Negative compression occurs ifthe output file “code set” is statistically suboptimal. Without a prioriknowledge of the file statistics, the possibility of negativecompression does exist in the disclosed technique. Compression occurs onany run length of five or more bits. Any run-length of four bits or lessis passed through as an uncompressed (literal) code. Run-length countsare “thresholded” into three discrete counter lengths. An end-of-filecode uniquely exists as a zero-length literal code. Odd-length fileterminations are resolved in both literal mode and run-length mode. Aunique code exists for binary stream inversion.

The basic format used for the disclosed compression technique is avariable-length bit code commonly referred to as comma code prefixed toa variable length compression operator. The disclosed embodimentcomprises seven comma codes: a first comma code denoted in the outputstream by a single “0” bit (also called an 8-bit literal code), a secondcomma code denoted in the output stream by a binary “10” (also called afixed 3-bit run-length counter with an implied “1” bit), a third commacode denoted in the output stream by a binary “110” (also called aninversion code), a fourth comma code denoted in the output stream by abinary “1110” (also called a fixed 6-bit run-length counter with animplied “1” bit), a fifth comma code denoted in the output stream by abinary “11110” (also called a variable run length with an implied “1”bit), a sixth comma code denoted in the output stream by a binary“111110” (also called a variable run length with no implied “1” bit),and a seventh comma code denoted in the output stream by a binary“111111” (also called a variable literal, and which has the dual purposeof providing an end-of-stream (EOS) termination code). The order inwhich the comma codes are executed during analysis of 8-bit blocks ofthe input bit stream is important, and is discussed in greater detailhereinbelow. By using any of the above-mentioned comma codes, any binarystream can be compressed effectively during a single-pass.

Referring now to FIG. 16, there are illustrated the basic structures ofthe comma codes. The first comma code 1600 is the 8-bit literal, andoutputs a single binary “0” bit 1602. The first comma code 1600 isassigned as a literal output code (“literally” the same uncompressedbits as the input string). The first comma code body 1604 (bits B₁-B₈)of the output literal is fixed at a length of eight bits, since therelational processor analyzes input blocks of eight bits at a time.Fixing the length at eight bits is significant for two reasons. First,the total length of an output literal code is limited to no more thannine bits (the single comma code bit “0” followed by the eight inputbits). The first comma code 1600 code operates on a threshold of fourbits such that when a run length of similar bits fails to exceed three,the literal string image of eight bits is appended to the single commacode binary “0” bit 1602. Thus the worst-case negative compression islimited to 112.5% (computed as the (number of output bits) divided by(number of input bits)=9/8, or 112.5%). Second, the “break even” pointfor inserting an inversion code is eight bits. The break even point isdefined where the length of the output code is the same length as theinput code. (The inversion code is discussed in greater detail duringthe discussion of third comma code hereinbelow.)

A second comma code 1606 is the fixed 3-bit run-length counter with animplied “1” bit. The code length is a total of five bits (the two binary10 bits 1608 plus a fixed 3-bit count 1610 (C₁-C₃)). The second commacode 1606 is assigned to operate on bit streams having short run lengthsof four to eleven bits, inclusive (i.e. , has a threshold of four bits).The fixed 3-bit count 1610 is the binary representation of the decimalvalue of the number of bits being compressed. This 3-bit counter codeincludes an offset of four such that the 3-bit counter code is computedby adding the value of four to the 3-bit table address. For example, ifthe input bit stream has nine “0” bits which are to be compressed, thevalue in the fixed 3-bit count 1610 would be a binary representation ofa decimal nine offset by a value of four (or binary 101). It should benoted that the disclosed run-length technique operates to compresszeros. Therefore, run lengths of “1” bits are inverted to zeros forcompression. Consequently, a run length of “0” bits is assumed toterminate by the presence of a “1” bit. The terminating “1” bit is alsocalled an “implied” one bit. The implied bit is automatically absorbedinto the compressed string since it is known that the string of similarbits terminates at a bit change. When including the implied bit, theactual string length encoded is from 5-12 bits, since the implied “1”bit is included in the bit string for encoding purposes.

A key issue for this second comma code 1606 is the break even point(also a key issue for all of the other comma codes, for that matter),such that exceeding the break even point results in negativecompression. According to this second comma code 1606, an implied “1”bit is assumed at the end of a string of “0” bits. Therefore, a minimumrun length of four binary zero bits with a trailing “1” bit (asspecified for this comma code) represents a minimum run length that canbe encoded without any negative compression. Since an input streamhaving a run length less than four bits (plus a trailing implied bit)would be less than the output code which is stipulated at five bits,negative compression would occur.

The third comma code 1612 is the inversion code, and outputs a binary110. It has a fixed length of three bits. The third comma code 1612 isinserted into the output data stream to indicate that the bit trend isopposite to what is currently being processed. The third comma code 1612is applied when a string of contiguous “1” bits exceeds seven bits (athreshold of eight “1” bits) in length (since strings of zeros are morelikely to occur, inversion of 1-bits to zeros is desirable to extend thecompression of the bit stream). Application of the third comma code 1612triggers use of another comma code which provides compression of the runlength of similar bits. For example, if the run length of similar bitsis less than twelve, the fixed 3-bit run-length counter is used; if therun length is less than seventy-six similar bits, a fixed 6-bit runlength counter is used; and if the run length exceeds seventy-five bits,a variable run length comma code is used.

The threshold is determined by the concatenation of the fixed 3-bitcounter code 1606 which has five bits to the inversion code 1612 whichhas three bits. As an example, where a string of “0” bits was justprocessed but now a string of “1” bits appears to be the current trend,an inversion code 1612 will be inserted in the output stream to note thepoint at which the bits toggled from 0's to 1's. The inversion code 1612must be inserted into the output data file to indicate compression in“inverted” mode. The actual fixed 3-bit run-length code 1606 appended tothe inversion code 1612 depends on the final run length count. Thestream inversion code 1612 toggles the state of the inversion FLAG froman initial state of zero. Note that the inversion FLAG also affectsliteral values. More information is proved hereinbelow during analysisof the bit stream adaptive inversion.

The fourth comma code 1614 is the fixed 6-bit run-length counter with animplied “1” bit. The code length is ten bits (four binary 1110 bits 1616for the code and six bit places for the 6-bit count 1618 (C₁-C₆)). Thefixed 6-bit count 1618 is the binary representation of the decimal valueof the number of bits being compressed. The fourth comma code 1614 is abridge between the second comma code 1606 (i.e., the fixed 3-bit count)and the variable run-length code (a fifth comma code, discussedhereinbelow), and is used when the run length of similar bits is from12-75 bit places, inclusive. An implied “1” bit is assumed to terminatethe run count. This fixed 6-bit run-length counter code has a threshold(offset) of twelve bit places. The largest decimal value which can berepresented in six binary bits is 26 or a decimal sixty-four. Therefore,the limit of the code is 12+(2⁶−1)=75 bits.

The fifth comma code 1620 is the variable run length code with animplied “1” bit (also called the “universal” code, since any run lengthcan be encoded using it). The code length is from 17-41 bits, inclusive,and consists of five binary 11110 bits 1622 for indicating the variablerun length code, a 5-bit counter modulus 1624 (C₁-C₅), and a variablelength field 1626 of 7-31 bits, inclusive. An implied “1” bit is assumedat the end of the run length stream. The fifth comma code 1620 has athreshold of 76 bits and a limit of 2³¹-1 bits. It accomplishes this by“trimming” the counter 1624 length to that which is actually required torepresent the run-length count. The trimming is accomplished by a fixed5-bit field referred to as the “counter modulus.” This comma code isused when the run length of similar bits is from 76 to 2,147,483,647 bitplaces, inclusive. This variable run-length code has an optimalthreshold (offset) of seventy-six bit places.

For example, a bit string of seventy-eight zeros and an impliedtermination bit of “1” will be represented at the output as 11110001111001110 (spaces added for clarity, only). The first five bits(11110) indicate the code 1622 which represents that the variable lengthcomma code (with implied “1” bit) is used; the next five bits are the5-bit counter modulus 1624, which is a binary representation of thedecimal value for the number of bit places (seven or binary 111)required in the following variable length field 1626. The variablelength field 1626 is a binary representation of number of bit placescompressed. In this example, seventy-eight zeros were compressed, so thebinary number placed in the variable length field 1626 is 1001110

The sixth comma code 1628 is substantially similar to the fifth commacode 1620, except that there is no implied bit at the end of the bitstream count (i.e., the last bit in the input stream was a “0” bit).This code is used only to encode a bit stream at the end of the inputfile. A further implication is that the end-of-stream code willimmediately follow. The sixth comma code 1628 has a code length of 18-42bits, inclusive, and consists of six binary 111110 bits 1630 forindicating the variable run length code without an implied bit, a 5-bitcounter modulus 1632 (C₁-C₅), and a variable length field 1634 of 7-31bits, inclusive. The sixth comma code 1628 has an optimal threshold ofseventy-six bits and a limit of 2³¹-1 bits. However, it can be used evenif the run length is less than the optimal threshold. The sixth commacode 1628 is used only when the last code to output ends in a “0” bit.This code always precedes an end-of-stream code (mentioned in greaterdetail hereinbelow).

The seventh comma code 1636 serves a dual purpose. In a first instance,it is used to “clean up” any stray bits, as would occur in a partialliteral (any number of bits pulled in at the input that is less thaneight bits). It is used for end-of-file cleanup where an odd lengthliteral is required to flush out the final bit stream elements of theoutput. As mentioned hereinabove, the first comma code 1600 is the 8-bitliteral which encodes eight bits. Therefore, less than eight bits can beencoded with this seventh comma code 1636. The seventh comma code 1636has a code length of 9-16 bits, inclusive, and consists of six binary111111 bits 1638 for indicating the variable literal code, a 3-bitcounter modulus 1640 (C₁-C₃), and a variable length field 1642 of 0-7bit places, inclusive. The seventh comma code 1636 has a threshold offour bits. To identify the literal bit stream length, a 3-bit count 1640follows the code 1638. The actual literal input stream of less thaneight bits then follows the 3-bit count 1640.

In a second instance, the seventh comma code 1636 provides anend-of-stream termination (EOS) code 1644. The EOS code 1644 has alength of nine bits and is a binary 111111000. The existence of apartial literal of zero length permits the encoding of a unique code tosignify the “end-of-stream” for the compressed output. This is the finalcode applied to a compressed output stream, and is a special case of thevariable literal code of the first instance where the length is zero.Bits which are “0” may be appended to this code to bring the output to afixed 32-bit word for I/O purposes. The comma code types are summarizedin the following Table 5. TABLE 5 Summary of the Run-Length CompressionCodes Comma Code Binary Bit Place Code Type B0 B1 B2 B3 B4 B5 1. 8-bitliteral (minimum 4 bits) 0 2. Fixed 3-bit counter (4-11 bits) 1 0 3.Inversion code (minimum 8 bits) 1 1 0 4. Fixed 6-bit counter (12-75bits) 1 1 1 0 5. Variable run length (implied “1” bit)(76 to 1 1 1 1 0   2³¹ − 1 bits) 6. Variable run length (no implied “1” bit) 1 1 1 1 1 0   (76 to 2³¹ − 1 bits) 7. Variable literal (minimum 4 bits) 1 1 1 1 1 1End-of-stream termination code ** (bits B0-B5 of the variable literalwith three “0” bits appended - 9 bits total)

COMPRESSION EXAMPLE #1

Referring now to FIG. 17, there is illustrated a raw input bit streamwith its encoded and compressed output. In order to demonstrate some ofthe compression codes, the following simple example is offered. Fivesections of binary bit patterns are presented as a continuous inputstream 1700 of 240 bits, and are broken out as follows for easydiscussion. Section 1 consists of three literal 8-bit binary patternstotaling twenty-four bits: a first literal 1702 of binary 01010101, asecond literal 1704 of binary 10101010, and a third literal 1706 ofbinary 11110000. Section 2 consists of a binary 8-bit inversion pattern1708 (of all “1” bits) to trigger inversion. Section 3 consists ofanother 8-bit binary pattern, a fourth literal 1710 of binary 10101010.Section 4 consists of a string of one hundred “1” bits 1712. Section 5consists of a string of one hundred “0” bits 1714.

This raw input stream 1700 is processed in 8-bit blocks and according tothe disclosed architecture, resulting in an encoded and compressedoutput bit stream 1716. In analyzing Section 1, the first literal 1702,second literal 1704, and third literal 1706 are processed using thefirst comma code 1600 (the 8-bit literal). The respective compressedoutput codes are a first encoded literal 1718 of binary 001010101, asecond encoded literal 1720 of binary 010101010, and a third encodedliteral 1722 of binary 011110000. Note that each of the three outputcodes-first encoded literal 1718, second encoded literal 1720, and thirdencoded literal 1722-has a code length of nine bits; a leading “0” bitto indicate that the strings are 8-bit literals which are not to becompressed, according to the first comma code 1600, according to the8-bit literal code type, and the body being the original 8-bit literalcode. This is an increase of a total of three bits from the input stringto the output string in the overall bit count for these three literals(a negative compression scenario). Note also that the third literal 1706was a possible candidate for inversion with the string of “1” bits, butthe run length threshold of four was less than the threshold of eightrequired for inversion coding to take place. Therefore, inversion didnot occur and the bit pattern was treated as a literal.

In analyzing Section 2, the 8-bit inversion pattern 1708 triggersinversion coding (the third comma code 1612), and meets the minimumthreshold of eight “1” bits required for inversion coding to take place.The 8-bit inversion pattern 1708 triggers insertion of an inversion code1724 of binary 110 in the output string. Used in conjunction with theinversion code, the fixed 3-bit run-length count (the second comma code1606) indicates the total number “1” bits being inverted. (Note thatwhen inversion occurs, it inverts the succeeding bits in the raw bitstream 1700.) Note also that the literal pattern 1708 following thestring of eight “1” bits begins with another “1” bit (see Section 3).Thus to optimize compression over the maximum run length of similarbits, the leading “1” bit of the literal pattern 1708 is “absorbed” bythe 8-bit inversion pattern 1708 for computation of the 3-bit run-lengthcount. Thus, the total run length of “1” bits is nine. Furthermore, thefixed 3-bit run-length count comma code has a implied “1” bit toindicate the end of the stream of similar bits. However, in this casethe implied bit is a zero bit, since a zero bit indicates the end of thecontiguous stream of one bits. Thus two bits are processed from thesucceeding eight bits.

Adding an offset of four, as required when using this comma code,results in a decimal thirteen. However, the value of thirteen cannot beexpressed in three binary bits of the fixed 3-bit count, since is eightthe maximum. Applying a modulo eight results in a 3-bit count value offive (or binary 101). The 3-bit run-length code 1726 inserted at theoutput as a result of this input string is a binary 10101, where theleading two bits 10 indicate the comma code for the 3-bit fixedrun-length counter, and the last three bits 101 indicate the total runlength of nine bits (with an offset of four).

Analysis of Section 3: Since the leading “1” bit of this original setwas “absorbed” during inversion coding of the previous Section 2, and animplied bit was also processed, the next eight bits pulled in forprocessing results in “borrowing” a two “1” bits from the string of onehundred “1” bits (now reduced to a string of ninety-eight “0” bitsbecause of inversion). Therefore, the 8-bit string to be processed is10101011 (before inversion) and 01010100 after inversion. As mentionedhereinabove, since an inversion occurred with the 8-bit inversionpattern 1708, the succeeding bits are also inverted. The string will beencoded as an 8-bit literal 1600. The inverted literal binary code 1728inserted at the output is 001010100.

Analysis of Section 4: The following run of ninety-eight “0” bits 1712(previously a run of one bits) offers significant compressionpossibilities, and results in an output code 1730 of binary11110001111100010. The run length of ninety-eight zero bits triggers useof the variable run-length comma code with an implied “1” bit 1620. Thevariable length code 1732 output is 11110 (indicating use of thevariable run-length comma code with an implied “1” bit 1620). Followingthe variable length code 1732 is a fixed 5-bit count 1734 having abinary count of 00111. The fixed 5-bit count 1734 represents the numberof bit places required to represent the binary value of the continuousstring of similar bits which are compressed. In this case, the count isseven (or binary 00111) indicating that following the fixed 5-bit count1732 are seven bit places 1736. A modulus of seven is correct since amaximum of seven bits are required to provide a binary representation ofdecimal ninety-eight. The last seven bits (1100010) represent the binaryequivalence of the decimal number ninety-eight for the total run lengthof contiguous “0” bits. Also associated with this variable run lengthcode 1620 is an implied bit which absorbs a bit from the succeedingstring of bits. This leaves ninety-nine remaining in the last set.

Analysis of Section 5: Compression of the remaining ninety-nine “1” bits(previously “0” bits prior to the inversion occurring in Section 2) nowoccurs. A run of at least eight “1” bits triggers inversion. Thereforean inversion code 1738 of 110 is output. Next run-length compression isperformed on the large string of similar bits. Since the bits have nowbeen inverted to all zeros, the variable run length with no implied bitcomma code 1628 is used. This variable run-length comma code 1628 isused only when the last code to process has a run length greater thanseventy-five bits and ends in a zero. Therefore, the resulting outputvariable run length string 1740 is 111110001111100011, where the leadingsix bits 1742 of 111110 represent the variable run-length comma code1628 with no implied bit; the next five bits 1744 are a binaryrepresentation the counter modulus which, in this example, is seven. Amodulus of seven is correct since a maximum of seven bits are requiredto provide a binary representation of decimal ninety-nine. The actualcount 1746 of continuous “0” bits is ninety-nine and has a binaryrepresentation of 1100011 (or hex 63).

Analysis of Section 5: Finally, an end-of-stream code 1644 is appendedat the end. The end-of-stream code 1644 (also represented as a block ofbits 1748 here) is 111111000.

The Final Compression Factor: The input bit stream count was 240 bits,and an output bit stream count of 91 bits, yielding a compression factorof 91/240 or approximately 38% of the original size of the input bitstream.

COMPRESSION EXAMPLE #2

A more complex example is now discussed wherein the bit stream comprisesone hundred 1-bits, one hundred 0-bits, and three hexadecimal values0×F0, 0×FF, and 0×AA. Upon encountering the first string of one hundred1-bits, the encoder outputs an inversion code (110) since more thanseven contiguous 1-bits exist. Upon triggering the inversion code, theentire bit stream of one hundred 1-bits is toggled to all zeros. Sincethe contiguous string of now one hundred 0-bits exceeds seventy-five andis not the last code to be output, the variable run length comma codehaving an implied 1-bit (11110) is used for compression. Therefore, thecomma code 11110 is output followed by a 5-bit binary modulus word 00111(having a decimal value of seven) indicating that the next wordfollowing is a count value having seven bit places for providing abinary representation of the decimal value 100 (the total number of “0”bits being converted). The count value output is then 1100100 (decimal100). Lastly, an implied 1-bit presumably terminates the end of thestring of zeros (inverted from a string of 1-bits), so a single “1” bitis absorbed from the succeeding string of bits, leaving a string ofninety-nine 1-bits (inverted along with the earlier inversion code).

The string of now ninety-nine 1-bits is interrogated and triggers aninversion code since a string of 1-bits exceeding seven in number issensed. A comma code of 110 is then output, followed by a variablelength comma code with an implied 1-bit (11110). The 5-bit countermodulus is again seven (00111) followed by a 7-bit binary representationof the decimal string count ninety-nine (1100011). An implied 1-bitoperation absorbs the leading bit of the next string, the 0×F0, changingit from a 1111 0000 to 1110 0001, remembering that the encoder looks at8-bit strings. Therefore, the trailing “1” bit is obtained from thefollowing hex word 0×FF. The new string 1110 0001 triggers an 8-bitliteral comma code resulting in a comma code output of “0” followed bythe literal 1110 0001.

The encoder encounters the next string, a hexadecimal 0×FF, now missingthe leading “1” bit since it was absorbed by the preceding comma code.In order to obtain eight bits, the encoder picks off the leading bit ofthe following hexadecimal value 0×AA (binary 10101010), a “1’ bit, andappends it. The 8-bit binary string is now a 11111111, which triggersthe inversion code of 110. The string is compressed using a fixed 3-bitrun length counter (with an implied 1-bit). Its comma code is a binary10, which is output, followed by the run length count (offset by four)of binary 100. Since the run length is actually eight, reducing it by anoffset of four results in a decimal value of four (or binary 100). Withthe inversion code, the binary string of 1111 1111 becomes 0000 0000with an implied 1-bit absorbed from the last hex word of 0×AA (now downto six bits in length)

The final hexadecimal of 0×AA was inverted from the seven bit string0101010 to 1010101 and had the leading 1-bit absorbed by the preceding8-bit string. The resulting string is now six bits in length, a binarystring of 010101. This triggers use of a variable literal comma code ofbinary 11111, which is output and followed by the 3-bit length count ofdecimal six (binary 110), and the literal code 010101. Since this is thelast of the bit stream, an end-of-stream comma code of binary 111111000is output.

Referring now to FIG. 18, there is illustrated an unbalanced treedecoding technique according to the disclosed embodiment. Whentraversing the tree according to the decoding process mentionedhereinabove (except that in this particular tree, a “1” bit means a leftturn and a “0” bit means taking a right turn) it can be seen that asimple state design can efficiently decode the compressed input stream.Starting from the top of tree 1800, it can be seen that to decode a bitstream having a first bit BO as a “0” results in taking a right turn offthe first node 1802 to a leaf 1804 which has a comma code of “0” (andwhich represents an 8-bit literal, as mentioned hereinabove). The outputis then decoded as an 8-bit literal. This is also summarized above inTable 5. Similarly, a comma code of “10” is decoded by starting at thetop of the tree 1800 and following the “1” path (or taking a left turnat node 1802) to a second node 1806. At this node 1806, the bit streamindicates a “0” path should be followed indicating that a right turnshould be made to a leaf 1808. The comma code “10” then results in theoutput being processed as a fixed 3-bit run-length (with implied “1”bit). Continuing on, the comma code for an inversion is a “110.” A bitstream having this string is decoded by starting at the top of tree1800, and taking two consecutive left turns at respective nodes 1802 and1806 (as indicated by the “1” bits). At a node 1810, the “0” bitindicates that a right turn should be taken to a leaf 1812 to decode theoutput as an inversion code.

A comma code having bits B0-B3 as “1110” indicates a fixed 6-bitrun-length counter function. A bit stream having such a bit sequence isdecoded by starting at the top of tree 1800 and making three consecutiveleft turns (per the “1” bits) at node 1802, 1806, and 1810. At a node1814, a right turn is made (in response to bit B3 being a “0” bit) to aleaf 1816 to process the output as a 6-bit run-length implied bit code.Similarly, a comma code having bits B0-B4 as “11110” is decoded bystarting at the top of tree 1800 and making four consecutive left turnsat nodes 1802, 1806, 1810, and 1814. At a node 1818, a right turn istaken (as indicated by bit B4 being a “0”) to a leaf 1820 to process theoutput as a variable run-length implied bit code. Continuing on, a commacode of “111110” results in a variable run-length function (without animplied bit) by starting at the top of tree 1800 and making fiveconsecutive left turns through nodes 1802, 1806, 1810, 1814, and 1818.At a node 1822, a right turn is taken (in response to the “0” bit) to aleaf 1824 to process the output as a variable run-length code.

The comma code “111111” is decoded by starting at the top of tree 1800and taking six consecutive left turns at nodes 1802, 1806, 1810, 1814,1818, and 1822. At leaf 1826, the output is processed as a variableliteral. Note that all of the bit patterns for the respective commacodes mentioned hereinabove are summarized in Table 5.

Negative Compression

With the disclosed compression technique, it is possible to havenegative compression, where the output file will be larger than theinput file. The following analysis explores the effects of negativecompression as well as the threshold of occurrence. The literalpass-through mode was integrated into the compression algorithm to placea “stop limit” on the size of any negative compression effect. There aretwo variations of the literal code form: the first is a fixed literaleight bits which has a comma code of zero; the second is a variablelength literal of zero to seven bits, having a comma code of 111111.

The second mode for literal coding is quite inefficient. However, it isonly applied to end-of-file “clean up” issues. This fixed-length versionis the only form to be reapplied throughout the run-length compressionprocess. As noted previously hereinabove, the fixed 8-bit literal outputcode 1600 format is a prefix of a single bit “0” followed by the actual8-bit literal stream. Thus, for every eight bits of raw data in, ninebits of data will go out. This results in a fixed compression factor of112.5% (that is 9/8×100%). This is a hard limit which, according to thedisclosed embodiment, can never be exceeded.

Positive Compression

It is also useful to know where in the compression analysis thatpositive compression occurs. The shortest code length for any run-lengthoutput is the 3-bit counter version. Its format is “10xxx,” where x caneither be either a “0” or a “1” bit. This code format results in alength of five bits. Therefore, the “break-even” length corresponds tofour “0” bits and a “1” bit. Positive compression occurs with arun-length of five (five “0” bits and a “1” bit). The resultingcompression factor is 5/6×100%=83% of the original input bit streamsize. Conversely, negative compression occurs at a run-length of threeand has, as previously mentioned hereinabove, a compression factor of112.5%.

Bit Stream Adaptive Inversion Analysis

The disclosed compression algorithm is designed to “analyze” theshort-term statistics on a binary data stream. This adaptive behaviorpermits compression to be efficient regardless of the data trend (astream of “1” bits or a stream of “0” bits) This adaptation process isfacilitated by three processing elements: (1) an inversion FLAG; (2) aunique inversion control code; and (3) a run-length bit counter. Theinversion FLAG is used to invert the binary stream using an XORfunction. If the FLAG is a zero, the data stream remains unadulterated.If the FLAG is set to a one, the stream is inverted (i.e., a “1” becomesa “0” and a “0” becomes a “1”). When the state of the inversion FLAGchanges, an inversion control code 1612 is inserted into the compressionoutput data stream. (As indicated hereinabove, the inversion controlcode is a binary 110.)

The bit counter determines when an inversion can occur. The thresholdfor inversion is determined by two other codes: (1) inversion code 1612(of binary 110), and (2) a three-bit counter run-length code 1606 of10xxx (where x is either a “0” or a “1”). This results in a total bitcount of eight. Thus, if a run-length of “1” bits is greater than seven,then an inversion code 1612 is inserted into the output compressed bitstream, and once complete, the run-length code 1606 is then sent out.

Three observations should be noted with regard to stream inversion; (1)literal fields also are inverted if the inversion FLAG is a bit value ofone. The compression must take this into account when reconstructing theoutput stream; (2) initially, the inversion FLAG is set to zero. As theFLAG is changed from a bit value of zero to one and back (as required),the inversion FLAG remains in its current state unless explicitlyswitched by a new inversion code; and (3) although data can be inverted,the comma codes are invariant.

Referring now to FIG. 19, there is illustrated a flowchart which definesprocessing for raw bit-stream encoding. Note that the function ofinputting bits implies the tracking of the inversion FLAG. If theinversion FLAG is a “1,” the incoming bit stream is inverted. Processingbegins at a start block 1900 and moves to an initialization block 1902to reset the bit counter to zero and set the inversion FLAG to zero. Theprogram then flows to a function block 1904 to fetch eight bits. Theprogram then flows to a decision block 1906 to determine if anend-of-file has been encountered. If so, program flow moves to adecision block 1908 to determine if there are any partial input bitsoutstanding (less than eight bits were pulled in for processing). If so,program flow moves to a function block 1910 to output a variable literalcode with a partial bit stream. Program flow is from function block 1910to a function block 1912 to output an end-of-stream code and exit theprogram, as indicated in block 1914. Referring back to decision block1908, if there are no partial input bits outstanding, program flows to afunction block 1912 to output an end-of-stream code, and exit theprogram as in block 1914.

Referring back to decision block 1906, if an end-of-file code has notbeen encountered at the input, program flow moves to a decision block1916 to determine if the first four bits are zeros. If so, program flowmoves to a function block 1918 to force the second four bits back to theinput. The bit counter is then set to four, as indicated in functionblock 1920, and program flow continues on to function block 1922 toprocess the run length. From function block 1922, the program flows to adecision block 1924 to determine if an end-of-file code has beenencountered. If so, program flows to function block 1912 to output of anend-of-stream code and exit the program, as indicated in block 1914. Ifan end-of-file has not been encountered, the program flows from decisionblock 1924 back to the input of function block 1904 to fetch eight morebits.

Referring back to decision block 1916, if the first four bits are notzeros, program flows to another decision block 1926 to determine if alleight bits are ones. If so, program flows to a function block 1928 toset the bit counter to eight, and toggle the inversion FLAG, asindicated in function block 1930. Program flows then to a function block1932 to output an inversion code. The program then flows to functionblock 1922 to process the run length. Referring back to decision block1926, if all eight bits are not ones, program flow moves to a functionblock 1934 to output a zero bit. The program then outputs an eight-bitliteral string, as indicated in function block 1936. The program thenflows back to the input of function block 1904 to fetch eight more bitsand continue the encoding process.

Referring now to FIG. 20, there is illustrated a flowchart of thesequence of steps for run-length processing as a subroutine of the mainencoding function. The process begins at a start block 2000 and moves toa function block 2002 where bits are input to the process. Program flowthen moves to a decision block 2004 to determine if an end-of-file codehas been received. If an end-of-file code has been received, programflow moves to a function block 2006 where a comma code 111110 is output.This code represents an output variable run-length without an implied“one bit.” Program flow then moves to a block 2008 to exit thesubroutine. If an end-of-file code has not been received, program flowmoves out of decision block 2004 to decision block 2010 to determine ifthe bit is equal to a binary one. If the bit is not equal to a binaryone, the program flow moves out of decision block 2010 to a functionblock 2012 to increment the bit counter, from which it then loops backto the input of function block 2002 to input more bits. If the bit was aone bit, program flow moves out of decision block 2010 to a decisionblock 2014 to determine if the run length is less than twelve.

If the run length is less than twelve, program flow moves to a functionblock 2016 to output a comma code (binary 10) indication of a three-bitrun length with an implied one bit. Program flow then continues on to afunction block 2024 where the program performs a normal exit back to themain encoding program. If the run length is greater than or equal totwelve, program flow moves out decision block 2014 to a decision block2018 to determine if the run length is less than seventy-six. If the runlength is less than seventy-six, program flow moves to a function block2020 to output a comma code (binary 1110) which represents a six-bitrun-length with an implied one bit. Program flow then moves fromfunction block 2020 to a function block 2024 to exit normally. If therun length is seventy-six bits or more, program flow moves from decisionblock 2018 to a function block 2022 to output a comma code (binary11110) which represents a variable run length with an implied one bit.Program flow moves from function block 2022 then to a block 2024 wherethe program performs a normal exit.

Referring now to FIG. 21, there is illustrated a flowchart of the decodeprocess. In the decode process, the leading bits (the comma codes bits)are interrogated to determine the particular comma code. If the leadingbit is a 0, an 8-bit literal is to follow. If the leading bit is a 1,the next bit is interrogated to determine if it is a 3-bit counter or aninversion, working down the list of comma codes until a match is found.Implied with the function of outputting a bit is the requirement thatthe output bit stream should be inverted if the inversion flag is a “1”bit. Also what is not shown, but assumed to exist, is the assembly anddisassembly of bit words to individual bits in both the encoding anddecoding processes. The process starts at a function block 2100 andcontinues on to an initialization block 2102 where the inversion FLAG iscleared. Note that at this point the flowchart will follow the decodingprocess as discussed in relation to the binary tree of FIG. 18. Afterthe inversion FLAG has been cleared in function block 2102, programflows to a function block 2104 to input a bit. The process theninterrogates the bit stream on a bit-by-bit basis. Program flow thenmoves to a decision block 2106 to determine if the bit which has beeninput is a binary “1.” If not, the program flows to a function block2108 to fetch eight bits, and then to a function block 2110 to processthe output as an 8-bit literal. If the bit is a “1,” as determined indecision block 2106, flow moves to a function block 2112 to input asecond bit.

If the second bit is not a “1” as determined in decision block 2114, theprogram flows to a function block 2116 to fetch a 3-bit run-lengthimplied bit code. (This is the same as arriving at leaf 1808 of FIG.18.) At this point, program flow moves to function block 2118 to outputzeros and then on to a function block 2120 to output a one bit. At thispoint, this particular 3-bit code process is over with. Referring backto decision block 2114, if the second bit input is a “1” bit, programflows to a function block 2122 to input a third bit. The third bit isinterrogated by decision block 2124 to determine if it is a “1” bit. Ifnot, that indicates that the first three bits comprise a binary 110which is the comma code for an inversion code. Therefore, program flowmoves to function block 2126 to toggle the inversion code from theinitialized setting of “0” to a “1” bit. At this point, having receiveda “0” bit, the three bits received up to this point represent aninversion code (a comma code of binary 110) and therefore the output isinverted accordingly. If the third bit is not a “0” bit, program flowmoves from decision block 2124 forward to a function block 2128 to inputa fourth bit.

The fourth bit is then interrogated by decision block 2130 to determineif it is a “1” bit. If not, the program flows to a function block 2132to fetch the 6-bit run-length implied bit code. Flow moves then to afunction block 2134 to output zeros, and then to a function block 2136to output a “1” bit. Since a “0” bit has been received at this point,the processing stops on this branch. On the other hand, the fourth bitis a “1” bit, program flow moves to a function block 2138 to input afifth bit. The fifth bit is then interrogated by a decision block 2140to determine if the fifth bit is a “1.” If not, it must be a “0” bit andprogram flow moves to a function block 2142 to input a five-bit modulus.Program flow then moves to a function block 2144 to input a variablelength count, and then on to a function block 2146 to output zeros. Infunction block 2148, a “1” bit is then output. Since the fifth bit was azero, processing stops after completion of this branch. On the otherhand, if the fifth bit was a zero, as interrogated by decision block2140, program flow moves to a function block 2150 to input a sixth bit.

If the sixth bit is not a one, as determined by decision block 2152,program flow moves to a function block 2154 to input a five-bit modulusand then one to function block 2156 to input a variable length count.Program then flows to function block 2158 to output zeros. At thispoint, since the sixth bit was a “0” bit, the output code decoding iscompleted on this branch. On the other hand, if the sixth bit wasdetermined to be a “1” bit, program flow moves from decision block 2152to a function block 2160 to input a three-bit literal count. Programthen flows to a decision block 2162 to determine if the count is equalto zero. If the count is not equal to zero, program flows to a functionblock 2164 to output the literal string and exit the process. If, on theother hand, the count does equal zero, program flows to a function block2156 to exit the process.

Referring now to FIG. 22, there is illustrated a block diagram of acompanding system. To enhance the query throughput even more, thecompanding system 2200 may be structured to handle large numbers ofqueries from one or more databases. Such a configuration is a realty incompanies having large telephone support departments which can befinancial institutions, computer support operations, or any functionrequiring large numbers of nearly simultaneous database queries. Thesedatabases may be located independently over data networks such as LANs,WANs, or even global communication networks (e.g., the Internet). In anycase, large numbers of database queries present a heavy load on systems.It can be appreciated that a system having independent multichannelrelational processing capability would greatly enhance query throughput.While one relational engine is performing recursive operations, anothermay be expanding or compressing super collections, and still anotherrelational engine may be performing thread conversion to collections.Therefore, the companding system comprises a number of relation enginecircuits which can perform independently or cooperatively on a number ofincoming database queries.

The companding system 2200 provides such a system and comprises one ormore relational engine circuits (1, 2, . . . , N) 2202, 2204, and 2206interfacing through respective interface circuits 2208, 2210, and 2212to a common bus 2214. The common bus 2214 may be any bus architecture,for example, a PCI bus used in computer systems. The common bus 2214 mayhave any number of devices connected thereto, but in this example, a CPU2216 having an associated memory 2218 is used to process records storedon a database 2220. (Note that the CPU 2216, memory 2218, and database2220 are similar to the CPU 406, memory 408 and database 410 mentionedhereinabove.) It should also be noted that the disclosed architecture isnot limited to a single CPU 2216, but is also operable to work with aplurality of CPUs 2216 (e.g., also CPU 2224), memories 2218, anddatabases 2220 (e.g., also database 2221). Each relational enginecircuit (2202, 2204 and 2206) comprises a plurality (1, 2, . . . , N) ofinput channels 2222 for conversion of threads to collections, recursiveprocessing, and companding of input streams.

It can be appreciated that loss of any bit of the encoded bit streamwill destroy the effectiveness of the compression technique. Therefore,error detection techniques such as CRC should be used when transmittingover great distances (e.g., computer networks). Furthermore, allcompression can be done using the universal comma codes, however theefficiency increases by adding the 3-bit and 6-bit comma codes.

Referring now to FIG. 23, there is illustrated an alternative embodimentwhere all input channels have decompression capabilities. The relationalengine circuitry 2300 interfaces to a PCI bus 402 via a PCI bridgecircuit 404. The PCI bus architecture is commonly found in a widevariety of personal computers and workstations. The PCI bus 402 is a32-bit wide local bus employed in many personal computers andworkstations for the transfer of data between the PC's main CPU andperiphery, such as hard disks, video cards or adapters, etc. Effectivetransfer speeds across the PCI bus 402 may reach up to 132 megabytes persecond. (It should be noted that this architecture is not limited to aPCI bus architecture but, is applicable to any architecture whichprovides the proper interfacing to the relational engine circuitry 400.)The relational engine circuitry 400 interfaces through the PCI bridge404 to a CPU 406 on the PCI bus 402. The CPU 406 has associated with ita memory 408 for storing data and furthermore, has associated with itand attached to the PCI bus 402, a storage unit 410 for the mass storageof files, including a database of records. A user wishing to query thedatabase of records stored in storage unit 410 enters the key fieldinformation into the CPU 406. The CPU 406 then performs the query andplaces the query results into the memory 408. The relational enginecircuitry 400 then retrieves the search results directly from the memory408 through a direct memory access (DMA) process across the PCI bus 402along a path 412 to memory 408, or indirectly through the CPU 406. Notethat the disclosed architecture is not limited to DMA but mayincorporate any memory-accessing process.

The PCI controller 404 provides the bus interface function between theexternal peripherals and the relational engine circuitry 400. In thisparticular embodiment, each channel processor (500, 502, 504, 506) hasassociated therewith a FIFO (2302, 2304, 2306, and 2308, respectively).The 32-bit wide FIFOs (2302, 2304, 2306, 2308) facilitate decompressionof bit streams prior to entry to the respective channel processors (500,502, 504, and 506). The output of the channel processors is a single-bitwide stream to the relational processor 416. Timing and control of therelational processor 416 is provided by timing and control circuitry428. Additionally, the timing and control circuitry 428 providesynchronization signals to the channel processors (500, 502, 504, and506), and to an output interface block 2310. The output of therelational processor 416 is a single-bit wide stream to a conversioninterface circuit 2312 for converting from either serial-to-paralleloutput or a bit position-to-integer output. After conversion, the outputof the conversion interface is fed to the compander 418. Output of thecompander 418 can be directed to any of the channel processors (500,502, 504, and 506), or the output interface 2310 via a bus interface2314. Compression is accommodated through the output interface 2310 tothe FIFO 534. The compressed stream is then sent through the controller404 to external points. The compander 418 works in conjunction with theFIFO controller 1100 and associated DRAM memory 1108 to facilitatecompression/decompression of bit streams. A 16-bit wide memory 2316 isalso accessible via the bus 2314 by the compander 418 for manipulationof 16-bit wide processing. A boot loader 2318 placed between thecontroller 404 and the bus 2314 facilitates booting of the relationalengine 400, and more particularly the relation processor 416. Updatesfrom the host are downloaded to the boot loader 2318, and the relationalprocessor 416 uploads the new code for execution.

Enhanced Boolean Processor With Parallel Inputs

Referring now to FIG. 24, there is illustrated a general block diagramof the processor of FIG. 5. This relational engine incorporates aparallel-to-serial conversion whereby four 32-bit wide parallel inputs2400, 2402, 2406 and 2408 are each input to respectiveparallel-to-serial converters 500, 502, 504 and 506, the output of eachconverter then being a serial bit stream 1 bit wide which is then fedinto the Boolean processor 416. The Boolean processor 416 then operateson the serial bit streams according to a boolean OpCode input to theBoolean processor at OpCode input 1003 and outputs another serial bitstream to the serial-to-parallel converter 516 for ultimate output of a32-bit wide word.

Referring now to FIG. 25A, there is illustrated a general block diagramof an enhanced Boolean processor having four 32-bit wide parallel inputs(although any number of inputs can be provided under constraint ofdesign considerations), according to a disclosed embodiment. Thisdisclosed embodiment does away with the parallel-to-serial converters500, 502, 504, and 506 of each of the respective 32-bit word inputs2400, 2402, 2406 and 2408 of FIG. 24. A Boolean processor block 2501(comprising 32 bitmap memories 0-31) is then provided to receivemultiple parallel inputs, in this particular case, 32-bit wide parallelpaths 2500, 2502, 2504 and 2506 for accommodating the respective 32-bitwide words. The Boolean processor 2501 operates according to BooleanOpCodes input over an OpCode input line 1003 to effectively output32-bit parallel words at its output.

The primary difference between the processors of FIGS. 24 and 25A, isthat of serial versus parallel word processing, respectively. In thearchitecture of FIG. 24, the Boolean operation on the fourchannel-output data streams are processed one bit at a time until all 32bits are complete. The basic concept behind the enhanced Booleanprocessor is to process the Boolean operations on all 32 bits of an“uncompressed collection” simultaneously. There are three types of datastreams that a file input to a channel can conform to: threads,compressed collections, and uncompressed collections. A sorted thread isa data stream that represents the index of a database record meeting thesearch criteria. The thread must be sorted in ascending order for properoperation. Each index is represented as a 32-bit integer. The compressedcollection is a run-length compressed stream of data, which whenuncompressed will represent the collection. The collection is acontiguous stream of data (i.e., a bit vector) representing the recordposition. Each bit position in the data stream indicates the index ofthe associated record. Threads are data streams which are the mostdifficult to handle. A thread is serial in nature and must first beconverted into an uncompressed collection. Compressed collections mustalso be converted to uncompressed collections, but unlike threads, theconversion process is not as complex. Finally, uncompressed collectionsare the simplest form of data stream to process since no additionalformatting issues are presented when in an uncompressed collection form.In the enhanced processor, part of the channel processor subsections area parallel-to-serial converter such that the converter takes theparallel input feed and converts it into a serial feed for furtherprocessing by the Boolean processor. Once all of the processing iscompleted, the serial data stream is converted back into a parallel feedfor output to the host PC.

Referring now to FIG. 25B, there is illustrated a more detailed blockdiagram of the disclosed enhanced processing architecture. All inputchannels of the disclosed architecture now have recursion anddecompression capability, therefore, respective multiplexers (Mux A, MuxB, Mux C, and Mux D) are provided at respective subsection inputs forselecting either a standard 32-bit input from the DCA (data, control,and address) bus 420 (e.g., PCI), a recursion function, or adecompression function. The output of each of the multiplexers (A, B, C,and D) is a 32-bit word transmitted across respective 32-bit buses tothe respective subsections (A, B, C, and D), also designated as 2508,2510, 2512 and 2514. Each of the multiplexers A, B, C, and D of therespective subsections (2508, 2510, 2512 and 2514) have mode register(MR) inputs for selecting the operating mode according to commands sentfrom the mode register 536 across the DCA bus 420. Since the subsections(2508, 2510, 2512 and 2514) each contain a parallel-to-serial function,output of each of the subsections (2508, 2510, 2512 and 2514) is serialbit stream to 32 bitmap memory processors (designated bitmap memoryprocessor (0, 1, 2, . . . , 31)). Therefore, each of the four inputs ofthe bitmap memory processors (0, 1, 2, . . . , 31) receives one bit fromeach of the respective four subsections for processing. The resultingoutput of each bitmap processors (0, 1, 2, . . . , 31) is a single bitwhich is clocked into a register 2516. For example, a first bitmapmemory 2518 outputs a single bit (designated Bit 0) into the 32-bitregister 2516, a second bitmap memory 2520 outputs a single bit(designated Bit 1) into the 32-bit register 2516, and so on. Tofacilitate the entry of the four single bits from the respectivesubsections into the each of the bitmap memories (0, 1, 2, . . . , 31),an input control circuit (not shown) synchronizes the clocking of thefour single-bit outputs of the subsections (2508, 2510, 2512 and 2514)into the appropriate bitmap memories (0, 1, 2, . . . , 31). The 32-bitword in the register 2516 is then clocked out to either the FIFO 522 orthe compander subsection 418 for processing. Notably, the disclosedarchitecture is scalable to larger regimes (e.g., 64 bit, 128 bit, etc.)by increasing the number of bitmap memories and input subsections, alongwith the necessary support chips to handle such larger architectures.

If the value of register 5216 is to be used for recursion processing,the 32-bit word from register 2516 is passed into FIFO 522 and thereonto the dual concurrent FIFO 524 for temporary buffering. At the propertime, and under control of the mode register 536, the value of the dualconcurrent FIFO is then passed to any one of the four input multiplexers(Mux A, Mux B, Mux C, or Mux D) according to instructions from the moderegister 536. Alternatively, the value in the dual concurrent FIFO 524can be passed through the 2-to-1 multiplexer 532 and thereon to the FIFO534 for output onto the DCA bus 420. On the other hand, if the value inthe register 2516 were to be compressed, it would be clocked into thecompander subsection 418 for compression and therefrom to the 2-to-1multiplexer 532 through the FIFO 534 and onto the DCA bus 420.

If an incoming word from the DCA bus 420 needed to be decompressed, itis then passed through FIFO 528 to the compander subsection 418 fordecompression. The resulting decompressed value can then be input to anyone or more of the input multiplexers A, B, C, or D as selected by themode register 536. The uncompressed output word of the compandersubsection 418 can also be passed to the DCA bus 420 via the 2-to-1multiplexer 532 and FIFO 534, as determined by the mode register 536.

The mode register 536 receives signals across the DCA bus 420 tooperatively control devices in the disclosed architecture. For example,input multiplexers A, B, C, and D receive mode register control via moderegister outputs MR1, MR2, MR3 and MR4 respectively. Similarly, thesubsections 2508, 2510, 2512 and 2514 each receive respective moderegister outputs MR5, MR6, MR7 and MR8, respectively. The dualconcurrent FIFO 524 receives the MR10 mode register control, thecompander subsection 418 receives the MR10 register signal, and the2-to-1 multiplexer 532 receives the MR9 register control output signal.In addition, the state control machine 538 provides one or more controlinputs to other various devices not shown, for example, as illustratedin FIG. 5 hereinabove, the integer counter 546 receives the I-Incsignal, the record count block 540 receives the R-Inc signal, the binarycounter 544 receives the B-Inc output signal, and the dual concurrentFIFO 524 receives the CRD read signal and the CWR write signals from thestate control machine 538. Each bitmap memory (0, 1, 2, . . . , 31)receives the OpCode input 1003 for processing of the 32-bit wordsreceived from the respective subsections 2508, 2510, 2512 and 2514 inaccordance thereof.

Referring now to FIG. 25C, there is illustrated a high-level overview ofthe Boolean processing embodiment. The record identification scheme usesa 32-bit word 2540 comprised of 5 bits that identify the position of asingle bit which is set (Bit Position ID, or BP ID), 26 bits of tag ID(Tag ID), and a single remaining sign bit, which is discarded. The TagID identifies the particular fragment number with which the databaserecord is associated. For example, if a database has 100,000 records,the first 32 bits or records (which comprise a word) are designated asfragment #1, the next 32 bits as fragment #2, and so on. Therefore, oneor more bits set within fragment #1 need to be identified by somemechanism for ultimate processing by the disclosed architecture. Thosebits that are set (which identify records which meet the searchcriteria) within the 32-bit fragment are coded using one or more of theTag ID/BP ID words 2540. Where more than one bit is set in a 32-bitfragment, more than one word 2540 is needed to completely identify allrecords in that 32-bit fragment which meet the search criteria by havinga bit set. This will be discussed in greater detail hereinbelow withrespect to FIG. 25D.

Referring now to FIG. 25D, there is illustrated a general conversionprocess, according to a disclosed embodiment. A bit stream of 1, 2, 3, .. . ,N bits 2550 is illustrated having database records 20, 22, and 23with bits set indicating a match of the search criteria. For example, ifthe search criteria were all people in a database of 100,000 recordshaving red hair, records 20, 22 and 23 could be those that match thesearch criteria. Those three records would then have a bit set in therespective bit positions 20, 22, and 23. If other search criteria wereused, for example, all of the females in the database of 100,000records, another bit stream of 100,000 bits would be generated withthose records meeting the matched search criteria having a bit set inthose respective bit positions. The Boolean processor would then performa Boolean operation on the two 100,000-bit streams to obtain the resultfor all females having red hair. If any or all of the records 20, 22,and 23 were those of a female having red hair, the record(s) would bereturned as a result.

To properly designate those records within a 32-bit fragment that have abit set, the word structure 2540 of FIG. 25C is used extensively.Therefore, since three records 20, 22, and 23 each have a bit set withinfragment #1, three words 2552, 2554, and 2556, respectively, are used.Word 2552 has a Tag ID representing fragment #1, and a BP ID of 10100 (adecimal 20 for record 20). Word 2554 has a Tag ID, again, of fragment #1(since the record is a part of fragment #1) and a BP ID of 10110 (adecimal 22 for record 22). Word 2556 has a Tag ID representing the thirdbit set within fragment #1 by having a Tag ID of fragment #1 and a BP IDof 10111 (a decimal 23 for record 23). The disclosed architecture thenreturns the resulting stream of words 2558 which capture the recordsmeeting the search criteria.

Referring now to FIG. 26, there is illustrated the basic data flow ofthe thread-to-uncompressed collection conversion process. The threaddata is passed from a host PC into a 4K×32-bit Primary FIFO 2600. Theoutput of the Primary FIFO 2600 is fed to a Fragment Converter and BitAccumulator (FCBA) 2602 for processing of thread component IDs, whichwill be discussed in greater detail hereinbelow. The output of the FCBA2602 is then passed to a Secondary FIFO 2604 which accommodates 58-bitwide words for transfer to a fragment-to-collection converter 2608 to beprocessed into a collection which is a 32-bit serial word.

Referring now to FIG. 27, there is illustrated a simplified datastructure of a 32-bit thread in the Primary FIFO. Upon receipt of the32-bit thread from the Host Feed into the Primary FIFO 2600, the threadis structured into 3 bit groups; a sign bit which is bit 31, a Tag IDbit group of 26 bits comprising bits 5-30, and a bit position (BP) IDhaving 5 bits comprising bits 0-4. The primary FIFO 2600 is operable toaccommodate multiple input threads 2700 which are structured intorespective ID components such that the Tag ID is passed across an outputbus 2702, and the BP ID is passed along an output bus 2704 to respectivecomponents of the FCBA 2602.

Referring now to FIG. 28, there is illustrated a block diagram of asimplified fragment converter and bit accumulator. The FCBA 2602comprises a Tag ID comparator 2800, a bit accumulator section providedby the cooperative function of a 1-of-32 decode logic circuit 2802 andan R/S Flip-Flop array 2804 as the primary circuit blocks. Afterreceiving threads from the Host Feed into the Primary FIFO 2600, a Readysignal is issued from the Primary FIFO 2600 across a Ready signal path2808 to the Tag ID comparator 2800. The Tag ID comparator 2800 thenissues a Fetch command across a Fetch signal path 2806 to the PrimaryFIFO 2600 to initiate loading of the next Tag ID 26-bit word across theTag ID path 2702. The BP ID 5-bit word of the Primary FIFO 2600 ispassed to the 1-of-32 decode logic 2802 across the BP ID path 2704. The1-of-32 decode logic 2802 also receives a Set Bit signal from the Tag IDcomparator 2800 across a Set Bit connection 2810. At the proper time,the 1-of-32 decode logic 2802 passes a 32-bit word to the R/S Flip-Flops2804 according to the BP ID received from the Primary FIFO 2600. The RISFlip-Flops 2804 will accumulate all of the bits which have been setaccording to records matching the search criteria, within a specificthread input. For example, if three record positions have bits setwithin the 32-bit thread, eventually, three corresponding bits will beset in the R/S Flip-Flops 2804. Additionally, the Tag ID comparator 2800provides a Load command across a Load signal line 2812 to the SecondaryFIFO 2604 to enable loading of the 26-bit word from the Tag IDcomparator 2800 into the Secondary FIFO 2604 across a bus 2814. At theproper time, the Secondary FIFO 2604 also receives the value of the R/SFlip-Flops 2804 across a 32-bit bus 2816. At this time, the SecondaryFIFO 2604 contains a 58-bit wide word comprising a 26-bit Tag ID and a32-bit fragment. The Tag ID word and fragment words are then passed tothe fragment-to-collection converter 2606 for ultimate output of a32-bit collection.

Referring now to FIG. 29, there is illustrated a simplified blockdiagram of the Tag ID comparator 2800 of FIG. 28. The simplified blockdiagram comprises a state machine 2900, a 26-bit register 2902, and a26-bit comparator 2904. The Tag ID comparator 2600 operates according tothe following general steps:

Step 1: When the state machine 2900 detects that initial thread data hasbeen loaded into the Primary FIFO 2600 (via the Primary FIFO Not Emptyline), the state machine 2900 will be cleared and the Primary FIFO Fetchsignal will be asserted.

Step 2: The state machine 2900 then generates a Load signal to the26-bit register 2902, and a 26-bit Tag ID is then loaded into the 26-bitregister 2902.

Step 3: The state machine 2900 then generates a Set Bit signal.

Step 4: The state machine 2900 then generates the Primary FIFO Fetchsignal to the Primary FIFO 2600 to return a next 26-bit Tag ID.

Step 5: The next Tag ID is then loaded into the 26-bit comparator 2904.

Step 6: If the values in the 26-bit register 2902 and the 26-bitcomparator 2904 are the same, a Tag Match signal is sent back to thestate machine 2900 causing the state machine 2900 to repeat Steps 3-6,if there is more data in the Primary FIFO 2600. The Tag Match signal isasserted when more than record within a thread has a bit set. Thussuccessive records will have the same Tag ID, and a successfulcomparison generates the Tag Match signal. If the values are not thesame and the Secondary FIFO 2604 is not full, the state machine 2900generates Output Clock and Secondary FIFO Load signals, which then wouldcause the contents of the 26-bit register 2902 and the RIS Flip-Flops2804 to be output to the Secondary FIFO 2604. The unsuccessful match ofthe Tag IDs indicates that all fragments having the same Tag ID havebeen accounted for, and the corresponding record bits accumulated in theR/S Flip-Flops 2804, which are then passed to the Secondary FIFO 2604.Steps 2-6 will then be repeated, if there is data available in thePrimary FIFO 2600.

Referring now to FIG. 30, there is illustrated a block diagram of thebit accumulator of FIG. 28. The bit accumulator function is performedusing the R/S Flip-Flop section 2804 having thirty-two R/S Flip-Flops3000 each of which receives an input from the 1-of-32 decoder 2802, theother portion of the bit accumulator. In larger architectures havingmore than thirty-two bits, the number of flip-flops will be scaled upaccordingly. In operation, when the state machine 2900 generates thePrimary FIFO Fetch signal, the 5-bit BP ID is loaded across the BP IDsignal path 2704 into the 1-of-32 decoder 2802. Upon receipt of a SetBit signal from the state machine 2900, the 1-of-32 decoder 2802, whichhas decoded the 5-bit BP ID, signals which of the thirty-two R/SFlip-Flops 3000 will be set to a “1” at its output, which indicates arecord index. As mentioned hereinabove, if more than one record withinthe thread has a bit set, those additional bits are also accumulated inthe R/S Flip-Flops 2804. When the state machine 2900 generates aSecondary FIFO Load signal, the outputs of the R/S Flip-Flops 2804(having R/S Flip-Flops 3000) are loaded into the lower 32 bits of theSecondary FIFO 2604 as a 32-bit fragment. Each of the outputs of the 32RS Flip-Flops 3000 are then reset to “0” upon receipt of the Load signalfrom the state machine 2900 in preparation to load the next Tag ID intothe 26-bit register 2902 of the Tag ID comparator subsection 2800. Theprocess then repeats until the Primary FIFO 2600 does not contain anyadditional data.

Referring now to FIG. 31, there is illustrated a simplified datastructure of a 58-bit word in the Secondary FIFO 2604. As indicatedhereinabove, the inputs to the Secondary FIFO 2604 are primarily the26-bit Tag ID comprising bits 32-57 from the Tag ID comparator 2800across the Tag ID path 2814, and the 32-bit word from the R/S Flip-Flops2804 across the 32-bit path 2816, which then becomes a fragment in theSecondary FIFO 2604 comprising bits 0-31. The output of the SecondaryFIFO 2604 consists primarily of two data paths; a Tag ID output path3100 and a fragment output path 3102, both of which are inputs to thefragment-to-collection converter 2606.

Referring now to FIG. 32, there is illustrated a block diagram of afragment-to-collection converter at the output of the Secondary FIFO ofFIG. 28. The three main components of the fragment-to-collectionconverter 2606 are a 26-bit comparator 3200, thirty-two logic gates3202, and a 26-bit counter 3204. In operation, the 26-bit counter 3204is initialized to zero via a RESET line. The 26-bit Tag ID, receivedacross the Tag ID input path 2814 into the Secondary FIFO 2604, is thenloaded from the Secondary FIFO 2604 across the Tag ID output path 3100into the comparator 3200, and the 32-bit collection fragment, receivedfrom the R/S Flip-Flop 2804 across the R/S Flip-Flop output path 2816,is loaded from the Secondary FIFO 2604 across the 32-bit fragment path3102 into the thirty-two logic gates 3202, the 32-bit fragment obtainedfrom the lower 32 bits of the 58-bit word of the Secondary FIFO 2604,and which logic gates circuit 3202 comprise 32-bits of logic toaccommodate the 32-bit collection fragment. The value of the 26-bitcounter 3204 is then compared in comparator 3200 with the 26-bit Tag ID.If the two values do not match, a Next signal is then generated andinput to the 26-bit counter 3204 and logic gates 3202, causing the26-bit counter 3204 to increment by one, and the logic gates 3202 tooutput zeroes. The comparison process between the counter 3204 and theTag ID is then repeated. If the two values do match, a Match signal isgenerated and issued to the logic gate 3202 and a clock input of theSecondary FIFO 2604 from the comparator 3200, causing the logic gates3202 to output the 32-bit fragment as a collection. If there is datastill available in the Secondary FIFO 2604, the process then repeats byloading the 26-bit Tag ID into the comparator 3200 and the 32-bitcollection into the logic gates circuit 3202 to repeat the comparisonprocess.

Referring now to FIGS. 33A and 33B, there is illustrated a flowchart ofthe thread-to-collection conversion process. The process begins at aSTART block 3300 and moves to a function block 3302 where threads areloaded into the Primary FIFO 2600 and separated into respective IDcomponents of the Tag ID and the BP ID. Flow is then to a decision block3304 to determine if this is a first-time instance of loading the 26-bitTag ID into compare circuitry. If so, flow is out the “Y” path to afunction block 3306 where the Tag ID is loaded directly into the 26-bitregister 2902 and its associated BP ID is used to set the appropriatebinary value of each of the R/S Flip-Flops of the RIS Flip-Flops 2804 inaccordance with the BP ID. Flow is then to a function block 3308 wherethe next Tag ID is then obtained from the Primary FIFO 2600 and loadedinto the comparator 2904, and it's associated BP ID is loaded into the1-of-32 decoder 2802. Flow is then to a function block 3310 where thecontents of the 26-bit register 2902 and the 26-bit comparator 2904 arecompared. On the other hand, if it is determined from decision block3304 that previous comparisons have been made such that the 26-bitregister 2902 is loaded, flow is out the “N” path to a function block3312 where the Tag ID is loaded into the comparator 2904 and it'srespective BP ID is loaded into the 1-of-32 decoder 2802. Flow is thenis function block 3310 where the contents of the 26-bit register 2902and the comparator 2904 are then compared.

Flow then continues to a decision block 3314 wherein if a match does notoccur, flow is out the “N” path to a function block 3316 where the statemachine 2900 generates an Output Clock signal to the 26-bit register2902, and then to a function block 3318 where the state machine 2900generates a Secondary FIFO Load signal which in conjunction with theOutput Clock signal causes the contents of the 26-bit register 2902 andthe values in each of the R/S Flip Flops 3000 of the R/S Flip-Flops 2804to be loaded into the Secondary FIFO 2604, as indicated in functionblock 3320. Notably, the values accumulated in the R/S Flip-Flop section2804 are output only when an unsuccessful match occurs with successiveTag IDs, as this indicates that any further records matching the searchcriteria are associated with a different thread, which thread has adifferent Tag ID associated therewith. Flow is then to a function block3322 where the state machine 2900 then generates a Primary FIFO Fetchsignal to the Primary FIFO 2600 to fetch the next thread data. Flow isthen to a decision block 3324 to determine if there are any otherthreads to fetch from the Primary FIFO 2600.

If not, flow is out the “N” path to a function block 3326 where thestate machine 2900 generates Set Bit, Output Clock, and Secondary FIFOLoad signals, causing the next tag and bit position IDs to be loadedinto the FCBA 2602 from the Primary FIFO 2600. Flow is then to afunction block 3328 where the 26-bit counter 3204 is then reset to 0 bya reset command. Flow is then to a function block 3330 where the firstTag ID and first 32-bit fragment from the Secondary FIFO 2604 are loadedinto the comparator 3200 and logic gates 3302, respectively. Flow isthen to a function block 3332 where a comparison is made of the value ofthe counter 3204 and the Tag ID of the Secondary FIFO 2604. Flow is thento a decision block 3304 to determine if a match has occurred. If not,flow is out the “N” path to a function block 3336 to increment thecounter 3204 by 1. Flow is then to a function block 3336 where the logicgates 3202 are configured to output zeroes. Flow is then back to theinput of the function block 3332 to then make another comparison basedon the change of value of the counter 3204 being incremented.

On the other hand, if a match has occurred in decision block 3334, flowis out the “Y” path to a function block 3340 where the logic gates 3202are signaled to output the 32-bit fragment as received from theSecondary FIFO 2604 and stored in the logic gates 3202. Flow is then toa decision block 3342 to determine if any more data exists. If not, flowis out the “N” path to a function block 3344 where the last 32-bitfragment is output from the logic gates 3202 and the process stops. Onthe other hand, if more data exists, as determined in decision block3342, flow is out the “Y” path to a function block 3346 where the nextTag ID is loaded into the comparator 3200 and a 32-bit fragment from theSecondary FIFO 2604 is loaded into the logic gates 3202. Flow is then toa function block 3348 where the counter 3304 is incremented by 1. Flowis then back to the input of function block 3332 where the values of thecounter 3204 and the Tag ID are compared.

Moving back to decision block 3324, if more threads are available fromthe Primary FIFO 2600, flow is out the “Y” path back to the input offunction block 3312 where the next Tag ID is loaded into the 26-bitregister 2902 of the Tag ID comparator circuit 2800 and the associatedBP ID of the Tag ID is loaded into the 1-of-32 decode logic 2802 to setthe values of the R/S Flip Flop 3000 of the R/S Flip-Flops 2804.

Referring back to decision block 3314, if a match has occurred betweenthe contents of the 26-bit register 2902 and the contents of the 26-bitcomparator 2904, flow is out the “Y” path to a function block 3350 wherethe state machine 2900 generates a Set Bit signal and the 1-of-32decoder 2802 will set the appropriate values of the R/S Flip Flop 3000of the R/S Flip-Flops 2804, as indicated in function block 3252. Flow isthen to a function block 3354 where Fetch and Load commands are issuedto obtain the next Tag ID for insertion into the 26-bit comparator 2904of the Tag ID comparator circuit 2800, and its associated BP ID from theprimary FIFO 2600 for insertion into the 1-of-32 decoder 2802. Flow isthen back to the input of function block 3310 where a comparison isperformed between the contents of the 26-bit register 2902 and the26-bit comparator 2904.

Referring now to FIG. 34A, there is illustrated a basic thread structureand its constituent ID components, according to the process steps of theflowchart of FIGS. 33A and 33B. For purposes of illustration, fourthreads (#1, #2, #3, and #4) are used in the example provided inconjunction with the flowchart of FIGS. 33A and 33B. Notably, thediscussion of FIGS. 33A and 33B assumes no initial data stored in any ofthe registers or comparators of the enhanced Boolean processor. As notedhereinabove, the 32-bit thread comprises a most-significant bit which isa sign bit (which is discarded) and 31 bits of ID information (Tag IDand BP ID). In a first block 3302 of FIG. 33A, where the illustratedfour threads are read into the Primary FIFO 2600, the 32 bits areidentifiable into three groups; the sign bit, 26 bits of Tag ID, and 5bits of BP ID. The sign bit is ignored for purposes of operation of theBoolean processor in creating collections. Note also that the disclosedBoolean processor embodiment is not restricted to 32-bit systems, butmay incorporate 64-bit or larger architectures.

Referring now to FIG. 34B, there is illustrated the contents of variousregisters according to the process steps of the flowchart of FIGS. 33Aand 33B. As indicated hereinabove, for this particular example, it ispresumed that the registers were empty prior to the insertion oroperation on the aforementioned four threads of FIG. 34A (i.e.,operation in the first instance). Therefore, the Tag ID comparator 2904is considered to be empty prior to loading of the first Thread #1 datainto the Boolean processor. Step 3 is associated with function block3306 where upon initial Fetch and Load, the contents of the Tag IDcomparator 2904 is empty such that the Tag ID of Thread #1 from theprimary FIFO 2600 is loaded directly into the Tag ID register 2902, andthe associated BP ID from the Primary FIFO 2600 is loaded through the1-of-32 bit decode logic 2802 to immediately set the values of the R/SFlip-Flops 2804 in accordance with the BP ID. (Note that the contents ofthe 26-bit Tag ID register 2902 is a 26-bit binary number, depicted forillustration purposes only as 00 . . . 001, with the twomost-significant bits and three least-significant bits, and the middlestring of similar “0” bits omitted.) A BP ID of binary 01000 converts toa decimal number of eight indicating that the ninth flip-flop 3000 ofthe thirty-two flip-flops 2804 is set to “1.” (The ninth flip-flop isset with a binary BP ID value of 01000, since a BP ID of binary 00000would indicate that the first flip-flop is to be set.) Therefore, thevalue in the RIS Flip-Flops 2804 is0000,0000,0000,0000,0000,0001,0000,0000. (The comma separators in thebinary words are inserted for ease of recognizing 4-bit groupings, andwill be used throughout in large strings of bits.)

In Step 4, the state machine 2900 generates the Primary FIFO Fetchsignal and the next Tag ID of Thread #2(00,0000,0000,0000,0000,0000,0001) is loaded into the comparator 2904and its associated bit position ID (01001) is loaded into the 1-of-32decoder. The contents of the 26-bit register 2902 and the comparator2904 are then compared, in a Step 5, which is not referenced in FIG.34B. In Step 6, since the values in the comparator and 26-bit registermatch, the state machine 2900 generates a Set Bit signal, and the1-of-32 decoder 2802 sets the appropriate values of the R/S Flip-Flops3000 in the R/S Flip-Flops 2804, in accordance with the BP ID of Thread#2. In this case, a BP ID of 01001 converts to a decimal value of nine,indicating that the tenth flip-flop of the thirty-two flip-flops 3000 isalso set to a binary “1” (resulting in a value of0000,0000,0000,0000,0000,0011,0000,0000 being accumulated in the R/SFlip-Flops 2804). In Step 7, the state machine 2900 generates thePrimary FIFO Fetch signal causing the next Tag ID(00,0000,0000,0000,0000,0000,0011) of Thread #3 to be loaded into thecomparator 2904 and its associated bit position ID (00000) loaded intothe 1-of-32 decoder 2802. In Step 8, since the values do not match, thestate machine 2900 generates the Output Clock and Secondary FIFO Loadsignals, causing the contents (00,0000,0000,0000,0000,0000,0001) of the26-bit register 2902 and the accumulated value0000,0000,0000,0000,0000,0011,0000,0000 set in the RIS Flip Flops 2804,to be loaded into the Secondary FIFO 2604.

In Step 9, the state machine 2900 then generates the Load signal,causing the Tag ID (00,0000,0000,0000,0000,0000,0011) of Thread #3 to beloaded into the 26-bit register 2902, and its associated bit position ID(00000) into the 1-of-32 decoder 2802 to set the appropriate value inthe R/S Flip Flops 3000. A BP ID of 00000 results in a decimal value ofzero which indicates that the first flip-flop 3000 of the thirty-twoflip-flops 2804 should be set to a “1” (resulting in a value of0000,0000,0000,0000,0000,0000,0000,0001 in the R/S Flip-Flops 2804). InStep 10, the state machine 2900 generates the Primary FIFO Fetch signal,causing the next Tag ID (00,0000,0000,0000,0000,0000,0100) andassociated bit position ID (01111) of Thread #4 to be loaded into theFCBA 2602. In Step 11, the values in the 26-bit register 2902 and thecomparator 2904 do not match, causing the state machine 2900 to generatethe Output Clock and Secondary FIFO Load signals, causing the contentsof the 26-bit register 2902 and the value set in the R/S Flip Flops 3000to be loaded into the Secondary FIFO 2604. A BP ID of 01111 results in adecimal value of fifteen which indicates that the sixteenth flip-flop3000 of the thirty-two flip-flops 2804 should be set to a “1” (resultingin a value of 0000,0000,0000,0000,1000,0000,0000,0000 in the R/SFlip-Flops 2804). In Step 12, the state machine 2900 generates the Loadsignal causing the next Tag ID and its associated bit position ID to beloaded into the FCBA 2602. However, since there is no additional data inthe primary FIFO 2600, the state machine 2900 generates the Set Bit,Output Clock, and Secondary FIFO Load signals. In Step 13, the 26-bitcounter 3204 is initialized to zero, and the first Tag ID(00,0000,0000,0000,0000,0000,0001) is loaded in the comparator 3200. Thefirst 32-bit collection fragment is then loaded into the logic gates3202.

In Step 14, the values of the counter 3204(00,0000,0000,0000,0000,0000,0000) and the Tag ID(00,0000,0000,0000,0000,0000,0001) are compared using comparator 3200.Since the values do not match, the Next signal is generated causing thecounter 3204 to increment by 1, and the logic gates 3202 to output allzeroes (see FIG. 35, the 1^(st) collection output). In Step 15, thevalues of the counter 3204 (00,0000,0000,0000,0000,0000,0001) and theTag ID (00,0000,0000,0000,0000,0000,0001) are compared again. This timethe values match causing a Match signal to be generated. When this Matchsignal is generated, the logic gates 3202 output the 32-bit fragment(see FIG. 35, the 2^(nd) collection output) and cause the next group ofdata in the Secondary FIFO 2604 to be loaded into thefragment-to-collection converter 2606. The counter 3204 is alsoincremented by 1. In Step 16, the values of the counter 3204(00,0000,0000,0000,0000,0000,0010) and the Tag ID(00,0000,0000,0000,0000,0000,0011) are compared. A match does not occurwhich causes the counter 3204 to increment by 1, and the logic gates3202 to output zeroes (see FIG. 35, the 3^(rd) collection output). InStep 17, the counter 3204 (00,0000,0000,0000,0000,0000,0011) and Tag ID(00,0000,0000,0000,0000,0000,00 11) are compared, and finding a match,the logic gates 3202 output the 32-bit fragment data (see FIG. 35, the4^(th) collection output), and the next group of data from the SecondaryFIFO 2604 is then loaded into the fragment-to-collection converter 2606,and the counter 3204 is incremented by 1. In Step 18, since this is lastthread data of Primary FIFO 2600 for this thread conversion example, thelogic gates 3202 will output directly the 32-bit fragment data (i.e.,0000,0000,0000,0000,1000,0000,0000,0000—see FIG. 35, the 5^(th)collection output). Note that in the above example, the steps are shownin a serial fashion, when in practice the actual operation of thehardware will have Steps 9-12 running concurrently with Steps 13-18.

Referring now to FIG. 35, there is illustrated a conversion of the bitbinary structure of the 32-bit fragments to the collection output datastream. At the output of the Secondary FIFO 2604 operates thefragment-to-collection converter 2606 having a 32-bit output. As matchesor mismatches are obtained, the logic gates 3202 output either therespective values of the gates 3202, or zeroes. A continuous processresults in a bit stream of collections 3500, which collections thenconstitute a contiguous stream of data (bit vectors) representing recordpositions, i.e., each bit position in the stream indicates the index ofthe associated records meeting the desired search criteria. The bitstream of collection(s) 3500 can then be compressed using the disclosedrun-length compression techniques, and decompressed using the samearchitecture.

Compressed/Uncompressed Collection Conversion

Compressed collections are handled by the enhanced architecture by firstdecompressing the compressed data. Decompression, as well ascompression, is performed using the following comma codes. The disclosedrun-length technique is substantially similar to the comma code setdescribed hereinabove, and describes a binary bit stream run-lengthcompression/decompression process. It adapts the output for bothrun-length outputs and random pattern (literal) outputs. Short-term“trend” statistics are evaluated to invert the bit stream, if required.The inversion process keeps the compression factor equal for runs ofcontiguous “1” or “0” bits. Whereas conventional two-pass systemsrequire the inclusion of a conversion key table for translation of theencoded data, the disclosed run-length encoding technique offers asingle-pass solution using “comma codes,” and with a stop limit onnegative compression, and no need for inclusion of a translation table.(Negative compression is where the resulting encoded output bit streamis bigger than the raw input bit stream.) Negative compression occurs ifthe output file “code set” is statistically suboptimal. Without a prioriknowledge of the file statistics, the possibility of negativecompression does exist in the disclosed technique. Compression occurs onany run length of five or more bits. Any run-length of four bits or lessis passed through as an uncompressed (literal) code. Run-length countsare “thresholded” into three discrete counter lengths. An end-of-filecode uniquely exists as a zero-length literal code. Odd-length fileterminations are resolved in both literal mode and run-length mode. Aunique code exists for binary stream inversion.

The basic format used for the disclosed compression technique is avariable-length bit code commonly referred to as comma code prefixed toa variable length compression operator. The disclosed embodimentcomprises seven comma codes: a first comma code denoted in the outputstream by a single “0” bit (also called an 8-bit literal code), a secondcomma code denoted in the output stream by a binary “10” (also called afixed 3-bit run-length counter with an implied “1” bit), a third commacode denoted in the output stream by a binary “110” (also called aninversion code), a fourth comma code denoted in the output stream by abinary “1110” (also called a fixed 6-bit run-length counter with animplied “1” bit), a fifth (or “universal”) comma code is denoted in theoutput stream by a binary “11110” (also called a variable run lengthwith an implied “1” bit), a sixth comma code denoted in the outputstream by a binary “111110” (also called a variable run length with noimplied “1” bit), and a seventh comma code denoted in the output streamby a binary “111111” (also called a variable literal, and which has thedual purpose of providing an end-of-stream (EOS) termination code). Theorder in which the comma codes are executed during analysis of 8-bitblocks of the input bit stream is important, and is discussed in greaterdetail hereinbelow. By using any of the above-mentioned comma codes, anybinary stream can be compressed effectively during a single-pass.

Referring now to FIG. 36, there are illustrated the comma codes usedwith of the enhanced Boolean processor embodiment. The first comma code3600 is the 8-bit literal, and outputs a single binary “0” bit 3602. Thefirst comma code 3600 is assigned as a literal output code (“literally”the same uncompressed bits as the input string). The first comma codebody 3604 (bits B₁-B₈) of the output literal is fixed at a length ofeight bits, since the relational processor analyzes input blocks ofeight bits at a time. Fixing the length at eight bits is significant fortwo reasons. First, the total length of an output literal code islimited to no more than nine bits (the single comma code bit “0”followed by the eight input bits). The first comma code 3600 codeoperates on a threshold of four bits such that when a run length ofsimilar bits fails to exceed three, the literal string image of eightbits is appended to the single comma code binary “0” bit 3602. Thus theworst-case negative compression is limited to 112.5% (computed as the(number of output bits) divided by (number of input bits)=9/8, or112.5%). Second, the “break even” point for inserting an inversion codeis eight bits. The break even point is defined where the output code isthe same length as the input code. (The inversion code is discussed ingreater detail during the discussion of third comma code hereinbelow.)

A second comma code 3606 is the fixed 3-bit run-length counter with animplied “1” bit. The code length is a total of five bits (the two binary10 bits 3608 plus a fixed 3-bit count 3610 (C₁-C₃)). The second commacode 3606 is assigned to operate on bit streams having short run lengthsof four to eleven bits, inclusive (i.e., has a threshold of four bits).The fixed 3-bit count 3610 is the binary representation of the decimalvalue of the number of bits being compressed. This 3-bit counter codeincludes an offset of four such that the 3-bit counter code is computedby adding the value of four to the 3-bit table address. For example, ifthe input bit stream has nine “0” bits which are to be compressed, thevalue in the fixed 3-bit count 3610 would be a binary representation ofa decimal nine offset by a value of four (or binary 101). (It should benoted that the disclosed run-length technique operates to compresszeros. Therefore, run lengths of “1” bits are inverted to zeros forcompression. Consequently, a run length of “0” bits is assumed toterminate by the presence of a “1” bit. The terminating “1” bit is alsocalled an “implied” one bit. The implied bit is automatically absorbedinto the compressed string since it is known that the string of similarbits terminates at a bit change. When including the implied bit, theactual string length encoded is from 5-12 bits, since the implied “1”bit is included in the bit string for encoding purposes.

A key issue for this second comma code 3606 is the break even point(also a key issue for all of the other comma codes, for that matter),such that exceeding the break even point results in negativecompression. According to this second comma code 3606, an implied “1”bit is assumed at the end of a string of “0” bits. Therefore, a minimumrun length of four binary zero bits with a trailing “1” bit (asspecified for this comma code) represents a minimum run length that canbe encoded without any negative compression. Since an input streamhaving a run length less than four bits (plus a trailing implied bit)would be less than the output code which is stipulated at five bits,negative compression would occur.

The third comma code 3612 is the inversion code, and outputs a binary110. It has a fixed length of three bits. The third comma code 3612 isinserted into the output data stream to indicate that the bit trend isopposite to what is currently being processed. The third comma code 3612is applied when a string of contiguous “1” bits exceeds seven bits (athreshold of eight “1” bits) in length (since strings of zeros are morelikely to occur, inversion of 1-bits to zeros is desirable to extend thecompression of the bit stream). Application of the third comma code 3612triggers use of another comma code which provides compression of the runlength of similar bits. For example, if the run length of similar bitsis less than twelve, the fixed 3-bit run-length counter is used; if therun length is less than seventy-six similar bits, a fixed 6-bit runlength counter is used; and if the run length exceeds seventy-five bits,a variable run length comma code is used.

The threshold is determined by the concatenation of the fixed 3-bitcounter code 3606 which has five bits to the inversion code 3612 whichhas three bits. As an example, where a string of “0” bits was justprocessed but now a string of “1” bits appears to be the current trend,an inversion code 3612 will be inserted in the output stream to note thepoint at which the bits toggled from 0's to 1's. The inversion code 3612must be inserted into the output data file to indicate compression in“inverted” mode. The actual fixed 3-bit run-length code 3606 appended tothe inversion code 3612 depends on the final run length count. Thestream inversion code 3612 toggles the state of the inversion FLAG froman initial state of zero. Note that the inversion FLAG also affectsliteral values. More information is proved hereinbelow during analysisof the bit stream adaptive inversion.

The fourth comma code 3614 is the fixed 6-bit run-length counter with animplied “1” bit. The code length is ten bits (four binary 1110 bits 3616for the code and six bit places for the 6-bit count 3618 (C₁-C₆)). Thefixed 6-bit count 3618 is the binary representation of the decimal valueof the number of bits being compressed. The fourth comma code 3614 is abridge between the second comma code 3606 (i.e., the fixed 3-bit count)and the variable run-length code (a fifth comma code, discussedhereinbelow), and is used when the run length of similar bits is from12-75 bit places, inclusive. An implied “1” bit is assumed to terminatethe run count. This fixed 6-bit run-length counter code has a threshold(offset) of twelve bit places. The largest decimal value which can berepresented in six binary bits is 26 or a decimal sixty-four. Therefore,the limit of the code is 12+(2⁶−1)=75 bits.

The fifth comma code 3620 is the variable run length code with animplied “1” bit (also called the “universal” code, since any run lengthcan be encoded using it). The code length is from 17-41 bits, inclusive,and consists of five binary 11110 bits 3622 for indicating the variablerun length code, a 5-bit counter modulus 3624 (C₁-C₅), and a variablelength field 3626 of 7-31 bits, inclusive. An implied “1” bit is assumedat the end of the run length stream. The fifth comma code 3620 has athreshold of 76 bits and a limit of 2³¹−1 bits. It accomplishes this by“trimming” the counter 3624 length to that which is actually required torepresent the run-length count. The trimming is accomplished by a fixed5-bit field referred to as the “counter modulus.” This comma code isused when the run length of similar bits is from 76 to 2,147,483,647 bitplaces, inclusive. This variable run-length code has an optimalthreshold (offset) of seventy-six bit places.

For example, a bit string of seventy-eight zeros and an impliedtermination bit of “1” will be represented at the output as 11110 001111001110 (spaces added for clarity, only). The first five bits (11110)indicate the code 3622 which represents that the variable length commacode (with implied “1” bit) is used; the next five bits are the 5-bitcounter modulus 3624, which is a binary representation of the decimalvalue for the number of bit places (seven or binary 111) required in thefollowing variable length field 3626. The variable length field 3626 isa binary representation of number of bit places compressed. In thisexample, seventy-eight zeros were compressed, so the binary numberplaced in the variable length field 3626 is 1001110

The sixth comma code 3628 is substantially similar to the fifth commacode 3620, except that there is no implied bit at the end of the bitstream count (i.e., the last bit in the input stream was a “0” bit).This code is used only to encode a bit stream at the end of the inputfile. A further implication is that the end-of-stream code willimmediately follow. The sixth comma code 3628 has a code length of 18-42bits, inclusive, and consists of six binary 111110 bits 3630 forindicating the variable run length code without an implied bit, a 5-bitcounter modulus 3632 (C₁-C₅), and a variable length field 3634 of 7-31bits, inclusive. The sixth comma code 3628 has an optimal threshold ofseventy-six bits and a limit of 2³¹−1 bits. However, it can be used evenif the run length is less than the optimal threshold. The sixth commacode 3628 is used only when the last code to output ends in a “0” bit.This code always precedes an end-of-stream code (mentioned in greaterdetail hereinbelow).

The seventh comma code 3636 serves a dual purpose. In a first instance,it is used to “clean up” any stray bits, as would occur in a partialliteral (any number of bits pulled in at the input that is less thaneight bits). It is used for end-of-file cleanup where an odd lengthliteral is required to flush out the final bit stream elements of theoutput. As mentioned hereinabove, the first comma code 3600 is the 8-bitliteral which encodes eight bits. Therefore, less than eight bits can beencoded with this seventh comma code 3636. The seventh comma code 3636has a code length of 9-36 bits, inclusive, and consists of six binary111111 bits 3638 for indicating the variable literal code, a 3-bitcounter modulus 3640 (C₁-C₃), and a variable length field 3642 of 0-7bit places, inclusive. The seventh comma code 3636 has a threshold offour bits. To identify the literal bit stream length, a 3-bit count 3640follows the code 3638. The actual literal input stream of less thaneight bits then follows the 3-bit count 3640. In a second instance, theseventh comma code 3636 provides an end-of-stream termination (EOS) code3644. The EOS code 3644 has a length of nine bits and is a binary111111000. The existence of a partial literal of zero length permits theencoding of a unique code to signify the “end-of-stream” for thecompressed output. This is the final code applied to a compressed outputstream, and is a special case of the variable literal code of the firstinstance where the length is zero. Bits which are “0” may be appended tothis code to bring the output to a fixed 32-bit word for I/O purposes.The comma code types are summarized in the following Table 6. TABLE 6Summary of the Run-Length Compression/Decompression Codes Comma CodeBinary Bit Place Code Type B0 B1 B2 B3 B4 B5 1. A single zero bit for an8-bit literal 0 2. Fixed 3-bit counter (run length of 1 0    4-11 bits)3. Inversion code (fixed 3 bits) 1 1 0 4. Fixed 6-bit counter (withimplied “1” bit) 1 1 1 0 5. Universal code (7-31 bits) (with implied 1 11 1 0    “1” bit) 6. Variable run length (7-31 bits) 1 1 1 1 1 0    (noimplied “1” bit) 7. Variable literal 1 1 1 1 1 1 8. End-of-streamtermination code ** ** (bits B0-B5 of the variable literal with three“0” bits appended - 9 bits total)

Referring now to FIG. 37, there is illustrated a system embodiment ofenhanced Boolean processors. To enhance the query throughput even more,the compression/decompression system 3700 may be structured to handlelarge numbers of queries from one or more databases. Such aconfiguration is a realty in companies having large telephone supportdepartments which can be financial institutions, computer supportoperations, or any function requiring large numbers of nearlysimultaneous database queries. These databases may be locatedindependently over data networks such as LANs, WANs, or even globalcommunication networks (e.g., the Internet). In any case, large numbersof database queries present a heavy load on systems. It can beappreciated that a system having independent multichannel relationalprocessing capability would greatly enhance query throughput. While onerelational engine is performing recursive operations, another may beexpanding or compressing super collections, and still another relationalengine may be performing thread conversion to collections. Therefore,the companding system comprises a number of relation engine circuitswhich can perform independently or cooperatively on a number of incomingdatabase queries.

The system 3700 provides such an architecture and comprises one or morerelational engine circuits (1, 2, . . . , N) 3702, 3704, and 3706interfacing through respective interface circuits 3708, 3710, and 3712to a common bus 3714. Each relational engine 3702, 3704, and 3706comprise one or more enhanced Boolean processors (BP) 3703, 3705, and3707 which receives 32-bit words across one or more 32-bit wide busesfrom respective input subsections 3722 (similar to input subsections500, 502, 504, and 506). The input channel subsections 3722 are used forthe conversion of threads to collections, recursive processing, andcompression/decompression of input streams. The common bus 3714 may beany bus architecture, for example, a PCI bus used in computer systems.The common bus 3714 may have any number of devices connected thereto,but in this example, a CPU 3716 having an associated memory 3718 is usedto process records stored on a database 3720. (Note that the CPU 3716,memory 3718, and database 3720 are similar to the CPU 406, memory 408and database 410 mentioned hereinabove.) It should also be noted thatthe disclosed architecture is not limited to a single CPU 3716, but isalso operable to work with a plurality of CPUs 3716 (e.g., also CPU3724), memories 3718, and databases 3720 (e.g., also database 3721).

It can be appreciated that loss of any bit of the encoded bit streamwill destroy the effectiveness of the compression technique. Therefore,error detection techniques such as CRC should be used when transmittingover great distances (e.g., computer networks). Furthermore, allcompression can be done using the universal comma codes, however theefficiency increases by adding the 3-bit and 6-bit comma codes.

Although the preferred embodiment has been described in detail, itshould be understood that various changes, substitutions and alterationscan be made therein without departing from the spirit and scope of theinvention as defined by the appended claims.

1. A relational processor, comprising: one or more input subsections forconverting parallel input data to serial output data, each of said oneor more subsections having a parallel input for receiving said parallelinput data and a respective subsection output for outputting said serialoutput data; a plurality of Boolean processors for processing saidserial output data into processed output data, wherein said plurality ofBoolean processors are each operatively connected to said subsectionoutput of said one or more subsections to receive said serial outputdata; and a data routing system connected to a respective processoroutput of each of said plurality of Boolean processors, said datarouting system for routing said processed output data to one or moredestination circuits; wherein the relational processor processes saidinput data in a single pass.
 2. The processor of claim 1, wherein saiddata routing system includes a compression/decompression circuit.
 3. Theprocessor of claim 2, wherein said compression/decompression circuitoperates on said processed output data by either compressing ordecompressing said processed output data for output to said one or moredestination circuits which comprise external systems, or decompressingcompressed data for input to said one or more input subsections.
 4. Theprocessor of claim 2, wherein said compression/decompression circuitcompresses said processed output data in accordance with one or morecomma codes, and decompresses compressed data in accordance with saidone or more comma codes.
 5. The processor of claim 4, wherein said commacodes are executed in a predetermined order during analysis of saidparallel input data by the relational processor.
 6. The processor ofclaim 1, wherein said input data is in the form of a thread which isformatted as one or more multi-bit fragments compatible with saidparallel inputs of said one or more input subsections.
 7. The processorof claim 6, wherein each said fragment comprises a sequential number ofbits whose bit positions are record indexes in a database, which saidrecord indexes having a bit value of one are further represented as astream of words each comprising a first set of bits designating a tag IDand a second set of bits designating a bit position ID.
 8. The processorof claim 7, wherein said tag ID associates said record index with aparticular fragment, and said bit position ID associates said bitposition with a specific record index of said database of recordindexes.
 9. The processor of claim 1, wherein said data routing systemincludes a recursion function such that said processed output data fromsaid plurality of Boolean processors is fed back to a select one of saidone or more input subsections for further processing by the relationalprocessor.
 10. The processor of claim 1, wherein thread data input tothe relational processor is converted to uncompressed collection data bybuffering a thread word of said thread data into a tag ID portion and abit position ID portion, said tag ID portion associating one or morerecord indexes with said thread data, and said bit position IDassociated one of said one or more record indexes.
 11. The processor ofclaim 10, wherein a comparison operation is performed on successive saidthread words in order to determine the total number of record indexesassociated with said thread data.
 12. The processor of claim 1, whereinthread data, compressed collection data, and uncompressed collectiondata can be processed therewith.
 13. The processor of claim 1, whereineach of said plurality of Boolean processors is a bitmap memory whichreceives the same opcode, which said opcode is uniquely associated witha Boolean operation being performed by said plurality of Booleanprocessors on said serial output data received from said output of saidone or more input subsections.
 14. The processor of claim 1, whereineach said Boolean processor of said plurality of Boolean processorsprocesses four input bits to output a single bit, which said single bitof each of said plurality of Boolean processors is output to a registerto create a processed multi-bit word, said processed multi-bit wordpassed to said data routing system for routing to said one or moredestination circuits.
 15. The processor of claim 1, wherein 32-bit wordscan be input to each said parallel input for processing by therelational processor.
 16. A method of processing with a relationalprocessor, comprising the steps of: converting parallel input data toserial output data using one or more input subsections, each of the oneor more subsections having a parallel input for receiving the parallelinput data and a respective subsection output for outputting the serialoutput data; processing the serial output data into processed outputdata with a plurality of Boolean processors, wherein the plurality ofBoolean processors are each operatively connected to the subsectionoutputs of the one or more input subsections to receive the serialoutput data; and routing the processed output data with a data routingsystem connected to a processor output of each of the plurality ofBoolean processors to route data therefrom to one or more destinationcircuits; wherein the relational processor processes the input data in asingle pass.
 17. The method of claim 16, wherein the data routing systemin the step of routing includes a compression/decompression circuit. 18.The method of claim 17, wherein the compression/decompression circuitoperates on the processed output data by either compressing ordecompressing the processed output data for output to the one or moredestination circuits which comprise external systems, or decompressingcompressed data for input to the one or more input subsections.
 19. Themethod of claim 16, wherein the compression/decompression circuitcompresses the processed output data in accordance with one or morecomma codes, and decompresses compressed data in accordance with the oneor more comma codes.
 20. The method of claim 19, wherein the comma codesare executed in a predetermined order during analysis of the parallelinput data by the relational processor.